Data360 Analyze

 View Only
  • 1.  concatenating N amount of columns and values comma separated

    Posted 11 days ago

    Hello, assume we have the input data as per below:

    Image 1

    Assume i want the input data represented as an output as below:

    Image 2

    Would this be possible to do for N amount of columns and N amount of data values, making it a generic solution without hardcoding each column name?

    It would be nice to have this as a generic solution so i dont have to hardcode the column names i want to concantenate.

    If a solution could be made, that is the same for 5,10,20 columns that gives me the output like in Image 2.

    I can do this in a calculate fields node, with that said. I have to manually define each column i want to concantenate.



    ------------------------------
    Henrik B
    E.ON Sverige
    ------------------------------


  • 2.  RE: concatenating N amount of columns and values comma separated

    Employee
    Posted 10 days ago

    I have been doing this a lot lately.  I use the Lookup node where the second input somehow specifies the rules for how the data in the first input should be combined.

    So in this lookup, the fields in the second input are combined into  an output field.  the name of that output field is specified by the variable named outputFieldName which in this code is hardcoded.  You could of course calculate the name from the fields in the second input like you seem to have done, but dynamic field names might be problematic for other nodes to deal with like you described.

    In this example, the "Get Metadata", "Pivot Data to Names", and "Head" nodes are just executing some logic that identifies the columns that need to be combined dynamically.  You would replace these with whatever you actually need.

    In the Lookup node, I change ConfigureFields to Advanced so that I can take better control of the output field definitions and save the names of the fields that will be combined.

    ConfigureFields:

    outputFieldName = "CombinedData"
    outputs[0][outputFieldName] = unicode
    
    myFieldList = []
    for name in inputs[0]:
        if name in inputs[1]:
            myFieldList.append(name)
        else:
            outputs[0] += inputs[0][name]

    The ProcessRecords code will do the work of combining the data together using the field list that was calculated in the ConfigureFields code.  Make sure to be prepared for Nulls in the input data.  And this uses the unicode so that any of the datatypes get converted to a corresponding string representation.

    ProcessRecords:

    outputs[0] += inputs[0]
    outputs[0] += inputs[1]
    
    data = []
    for name in myFieldList:
        value = inputs[0][name]
        if value is Null:
            data.append("")
        else:
            data.append(unicode(value))
    outputs[0][outputFieldName] = ",".join(data)

    The output looks like this:



    ------------------------------
    Ernest Jones
    Precisely Software Inc.
    PEARL RIVER NY
    ------------------------------



  • 3.  RE: concatenating N amount of columns and values comma separated

    Employee
    Posted 10 days ago

    While this solution is not as flexible as that provided by Ernest you can perform the concatenation in a Transform node. I have provided the logic to name the output field to be the concatenation of the input field names but I do agree with Ernest that this may cause some issues with certain nodes or external systems. 

    #### ConfigureFields Script
     
    #out1 += in1
     
    ## Create a list of the input field names
    fldArray = []
    for field in fields:
        fldArray.append(field)
     
    ## Get the number of input fields
    numFields = len(fldArray)
     
    ## Set the delimiter character
    delim = ","
     
    ## Concatenate the fieldnames
    all_fields = ','.join(fldArray)
     
    ## Output a field for the results with the concatenated fieldnames as its nane
    out1[all_fields] = unicode
     
    #### End ConfigureFields Script
     
     
     
    #### ProcessRecords Script
     
    #out1 += in1
     
    ## Set the output value to an empty string or the first value
    if fields[fldArray[0]] is Null:
        val = ""
    else:
        val = str(fields[fldArray[0]])
     
    ## If there are more fields the concatenate their values
    if numFields > 1:
        i = 1
        while i < numFields:
            if fields[fldArray[i]] is Null:
                val = val + delim 
            else:
                val = val + delim + str(fields[fldArray[i]])
            i += 1
     
    ## Output the value
    out1[all_fields] = val
     
    #### End ProcessRecords Script


    ------------------------------
    Adrian Williams
    Precisely Software Inc.
    ------------------------------