Data360 Analyze

 View Only
  • 1.  Using 1 Dataset and Try To Find In Another Dataset

    Posted 05-21-2025 03:38

    Hi Guys

    I have around 1700, 5-6 digits in length in one dataset 1 contained in one column and separate table, I need to find these in another column in separate dataset.2 

    The issue is, in the dataset 2 that i am trying to find these 1700 from dataset 1 can contain additional alpha numeric data.

    Example

    I need some kind of string find process.

    I'm hoping someone could put a graph together and help me out.

    Looking forward to your reply



    ------------------------------
    andrew darnell
    Knowledge Community Shared Account
    ------------------------------

    Attachment(s)

    txt
    Dataset 1.txt   37 B 1 version
    txt
    Dataset 2.txt   120 B 1 version


  • 2.  RE: Using 1 Dataset and Try To Find In Another Dataset

    Posted 05-21-2025 04:12
    Edited by Toby Harkin 05-21-2025 04:16

    I would convert the second dataset to include a new column that is just the integer first. Using a transform node and the below code.

    Configure fields:

    # Map all input fields

    out1 += in1

    # Add a new output field to hold the extracted integer(s)

    out1.ExtractedIntegers = str

    Process Records:

    import re

    # Copy all input fields
    out1 += in1

    # Extract integers from the input string field (e.g., 'Data')
    if in1['junk'] is Null:
    matches = Null
    else:
    matches = re.findall(r'\b\d{5,6}\b', in1['junk'])


    # Join all found integers into a single string (or handle as needed)
    out1.ExtractedIntegers = ','.join(matches) if matches else ''

    Then you can use a Expand From List node (Deprecated) if you do expect multiple results and split out the records with a comma.



    ------------------------------
    Toby Harkin
    TELSTRA LIMITED
    Sydney NSW
    ------------------------------



  • 3.  RE: Using 1 Dataset and Try To Find In Another Dataset

    Posted 05-23-2025 06:41
    Edited by Peter Sykes 05-26-2025 06:08

    I think the easiest thing to do is to create a cartesian join and match with transform, regex will work, but seems to not be needed based on your example.


    * Health warning, this is a brute force approach. Probably will not work in large scenarios. You mention 1700, it will depend on the other set if this is sustainable. If its 1700 x 1700, probably another approach is needed.



    ------------------------------
    Peter Sykes
    Data Governance & Architecture
    Vontobel Holding AG
    Zurich
    ------------------------------