Data360 Analyze

 View Only
  • 1.  Non accepted characters with "Modify Fields" for UNICODE-to-STRING conversion

    Posted 07-11-2019 07:03

    Facing an issue which was not present when I worked with Lavastorm on the exact same database.

    I've imported a CSV file with "FileCharacterSet" at Autodetect

    TIERSNOM filed's Type is Unicode

    Using "Modify Fields" to change the type into String

    Get that error message. See attachment

     

    Tried to use "FileCharacterSet"  with 

    UNICODE BOM

    UTF-8

    ISO 8859-1

    same issue

     

     

     

     



  • 2.  RE: Non accepted characters with "Modify Fields" for UNICODE-to-STRING conversion

    Employee
    Posted 07-15-2019 07:03

    I cannot see your attachment with the error message. Can you please re-post it and, if possible, a sanitized sample of the data that causes the error.

    When I try importing some sample data from a .csv file using AutoDetect for the FileCharacterSet property value the data is imported with a unicode data type. If I then set the Type property for a field in the Modify Fields node, the data is converted to string data type:

     



  • 3.  RE: Non accepted characters with "Modify Fields" for UNICODE-to-STRING conversion

    Posted 07-15-2019 08:10

    here is the error message

     

     



  • 4.  RE: Non accepted characters with "Modify Fields" for UNICODE-to-STRING conversion

    Employee
    Posted 07-16-2019 05:57

    I assume the undefined characters in the value generating the error have acute accents. I created some test data (see attached file) and imported it using the Autodetect option on the CSV/Delimited File node. The conversion to string type in the Modify Fields node also worked as expected. 

    Can you open your source CSV data file in Notepad++ and check the encoding being reported for the file, e.g.:

     

    Attached files


    TestData_UTF-8.csv

     



  • 5.  RE: Non accepted characters with "Modify Fields" for UNICODE-to-STRING conversion

    Employee
    Posted 07-16-2019 06:01

    It would be constructive to have a small sample of the data that is causing the issue. 

    If required, you can use the Submit a request link to open a ticket and upload the data to us.



  • 6.  RE: Non accepted characters with "Modify Fields" for UNICODE-to-STRING conversion

    Posted 07-16-2019 06:50

    my source is a 3Gb file

    I've identified these lines :

     



  • 7.  RE: Non accepted characters with "Modify Fields" for UNICODE-to-STRING conversion

    Employee
    Posted 07-17-2019 04:57

    In Data3Sixty Analyze can you add another CSV/Delimited node to the canvas and configure it to import the data as before. However, switch to the Define tab in the node properties panel, scroll down to the bottom and add a third output pin to the node :

    When you run the node there will be a single record on the out3 pin which provides details of the results of the auto-detection process. Please post this information to us.

    The characters you indicate in your post are valid for the Windows-1282 code page and ISO-8859-1 character set so there should be no issue handling those characters in Analyze within either a unicode data type field or string type field.

     

    I created a text file with the data you indicate causes the issues (attached).  The view of the data in a Hex editor is this:

    The highlighted byte is the first e with the acute accent (0xE9). When this data is viewed in Notepad++ the characters are displayed as expected:

     

    Can you also let us know what locale your machine is configured to use.

     

    Attached files

    Identified_Lines_w_Extended_Chars.txt

     



  • 8.  RE: Non accepted characters with "Modify Fields" for UNICODE-to-STRING conversion

    Employee
    Posted 07-17-2019 06:22

    If you want to replace the problematic characters you could use the following in a Transform node:

    out1.field=in1.field.decode("ascii","replace")

    or

    out1.field=in1.field.decode("ascii","ignore")