Data360 Analyze

 View Only
  • 1.  We have 2 Datasets to compare and get only the unique matched records. For records that are not matched or records that are not uniquely matched (multiple matches) then we have to ignore such records.

    Posted 08-23-2020 11:40

    To elaborate, we have dataset as below:

    Dataset1:

    “Pay String” is the complete string

    Col1, Col2, Col3, Col4 has the comma separated values from column “Pay String”

    E.G.

    |Pay string|              |COL1|     |COL2|     |COL3|     |COL4

    Bill,Vod,ISH,123          Bill           Vod            ISH         123

     

    Dataset2:

    Data dump which is used for identifying accounts.

    “Bank String” has the complete string

    Bank_col1, Bank_col2, Bank_col3, Bank_col4 has the comma separated values from “Bank String” column.

    |Bank string|                  |Bank_col1|     |Bank_col2|     |Bank_col3|     |Bank_col4

    Bill,Vod,EMA,1234               Bill                      Vod                   EMA                1234

    We want to know how to search for the “COL1” string with the entire “Bank string” with unique matches. Similarly for 'COL2','COL3','COL4' 

    By using Join node we are getting two matching records for eg. value=Bill in above dataset

    X-ref node gives two matches.

    Lookup gives First match.

    We want those records that have unique matches and ignore those having two or more matches.

    Kindly help.



  • 2.  RE: We have 2 Datasets to compare and get only the unique matched records. For records that are not matched or records that are not uniquely matched (multiple matches) then we have to ignore such records.

    Employee
    Posted 08-24-2020 05:35

    Can you clarify - when you are using the X-Ref node, which field is the LeftInputKey and which is the RightInputKey? A screenshot of the node configuration would be useful



  • 3.  RE: We have 2 Datasets to compare and get only the unique matched records. For records that are not matched or records that are not uniquely matched (multiple matches) then we have to ignore such records.

    Posted 08-24-2020 21:01

    While using X-ref node

    left input key = 'Bank string'

    right input key = 'COL1'

    But as it is gives output's that are Exact matches in my dataset it is giving 0 output.

    Hence I want something that will do the Contains operation 

    I want something that will check if 'Bank string' contains 'COL1' value



  • 4.  RE: We have 2 Datasets to compare and get only the unique matched records. For records that are not matched or records that are not uniquely matched (multiple matches) then we have to ignore such records.

    Employee
    Posted 08-25-2020 10:22

    Assuming the data elements have a fixed length you could use the 'Compare substrings' option of the Merge node. 

     

    You could use additional Merge nodes to compare Col2 - Col4 and in each one change the start and end points for the substring in the 'Bank string' field.



  • 5.  RE: We have 2 Datasets to compare and get only the unique matched records. For records that are not matched or records that are not uniquely matched (multiple matches) then we have to ignore such records.

    Employee
    Posted 08-25-2020 12:49

    Note: This was originally posted by an inactive account. Content was preserved by moving under an admin account.

    I've re-read this a couple of times and I don't think I fully understand the ask so I hope I give you something to think about at least. I do see a request to have a partial match within the Merge (or X-Ref) node being mentioned. This is how I have interpreted that:

     

    1. Bank String = Bill,Vod,EMA,1234 .... this will be our right input data.

    This means that it would match to Col1 = Bill or Col2 = Vod. I made the presumption we did not know how many Col1, Col2, Col3 etc. fields were going to be in our left input data.

    2. We can cycle though all of the elements in our list Bank String against all of Col1, Col2, Col3 etc. values.

    3. We can do this with a little python, depending on how big the data set is, it may end up being slow to cycle through all the values.

     

    Attached is my attempt at solving this problem. I used Join node just because the node allows us to have 2 input pins, we're not really "joining" anything.

    Also, I did this within a couple of for loops. If you wanted to just do "does Col1 exist within Bank string", you could removed one of them and replace it with an "if col1 in bankString" type of logic but I think you'll end up with some bad matches, especially the longer your Bank String value gets.

     

    Attached files

    Contains Match - 25 Aug 2020.lna

     



  • 6.  RE: We have 2 Datasets to compare and get only the unique matched records. For records that are not matched or records that are not uniquely matched (multiple matches) then we have to ignore such records.

    Posted 09-02-2020 04:39

    Hi Gerry, How can I count number of matches found

    For eg I have 2 strings

    BILL PAY,ISHWARI,123

    EMA,BILL PAY,123

    So when I match both string on BILL PAY,123 I am getting both strings as outputs,but when I match on Ishwari I should get one value.

    So I think I will be able to do find this if I get count of number of matching values.

     



  • 7.  RE: We have 2 Datasets to compare and get only the unique matched records. For records that are not matched or records that are not uniquely matched (multiple matches) then we have to ignore such records.

    Employee
    Posted 09-02-2020 13:24

    Note: This was originally posted by an inactive account. Content was preserved by moving under an admin account.

    If you create a field called RecordID on one of the data sets you can then use a histogram directly after your Merge or Join node and count up how many times it is there. If it is not unique it will be a count greater than 1. Have a look at this sample data flow.

     

    Attached files

    Count Matches - 2 Sept 2020.lna