Spectrum Technology Platform

 View Only

Isolate unparsed tokens from a string

  • 1.  Isolate unparsed tokens from a string

    Employee
    Posted 01-30-2020 06:19

    We have sometimes to parse individual tokens of a given string and we may want to isolate pieces that were not recognized.

    Let say you have the following String variable to parse "01300 MD MDI 0130012345" with blank separator between tokens.
    Spectrum components (OpenParser, NameParser, GGM, Validate Address,…) have recognized specific ones like State = "MD" and PostalCode= "01300". You can isolate the last 2 tokens with this simple groovy solution:

    // Breaking the string into tokens placed into a list
    data['TokenList']  = data['String'].split().toList();
    data['UnParsedTokenList'] = data['TokenList'];

    // Removing all occurrence in the list that have been recognized. Note the syntax -=
    data['UnParsedTokenList'] -= data['State'];
    data['UnParsedTokenList'] -= data['PostalCode'];

    // Optionally you can dedupe the remaining tokens
    data['UnParsedTokenList'].unique();
    // Optionally you can sort them
    data['UnParsedTokenList'].sort();

    // Finally you can build a new string with the remaining tokens separated by a space
    data['UnparsedString'] = data['UnParsedTokenList'].join(' ');

    Result is UnparsedString = "MDI 0130012345"



    ------------------------------
    Eric Hubert
    PreSales Engineer
    Pitney Bowes Software France
    Levallois Perret
    ------------------------------