Data360 Analyze

 View Only
  • 1.  Identify records containing cyrillic characters

    Posted 04-18-2023 12:41

    Hi,

    I want to use a split node to separate records containing cyrillic characters.
    For example. If Field1 has a value of "System of a Down - Официальная Дискография [MP3]", I want it to go to the false output pin of the split node.
    Any recommendation how to achieve this?

    Thank you.



    ------------------------------
    Dhinmar James Cayog
    Knowledge Community Shared Account
    ------------------------------


  • 2.  RE: Identify records containing cyrillic characters

    Posted 04-18-2023 13:17

    You could just try to convert the value to a string and if it fails it would indicate its not a windows-1252 character. This is on a transform node where I added a second out2 pin.





    ------------------------------
    Gerry Mullin
    Avalara Inc
    Seattle WA
    ------------------------------



  • 3.  RE: Identify records containing cyrillic characters

    Employee
    Posted 04-19-2023 10:37

    You can define a function in the ConfigureFields to check each character individually.  The unicodedata module has a function that retrieves the name of the character and if the name of the character contains the word cyrillic, then it is cyrillic.

    import unicodedata

    out1.x = unicode

    def has_cyrillic(text):
        for char in unicode(text):
            if 'CYRILLIC' in unicodedata.name(char):
                return True
        return False

    In the ProcessRecords property you can use the function in an if statement:
    text = in1.possible_cyrillic_text  # Possible Cyrillic text
    if has_cyrillic(text):
        out1.x = ("Contains Cyrillic characters")
    else:
        out1.x = ("Does not contain Cyrillic characters")



    ------------------------------
    Ernest Jones
    Precisely Software Inc.
    PEARL RIVER NY
    ------------------------------



  • 4.  RE: Identify records containing cyrillic characters

    Posted 04-20-2023 04:12

    Thank you! These ideas are great.



    ------------------------------
    Dhinmar James Cayog
    Knowledge Community Shared Account
    ------------------------------