Data360 Analyze

 View Only
  • 1.  Load excel sheets using parameters

    Employee
    Posted 08-02-2021 01:22

    Hi Team,

    I am trying to load excel files. Excel files are as such that data is sometimes in different sheet and data start row also might differ from file to file.

    We have excel sheet index and data start row coming from another File. I am trying to do following: Create a raw file using transform node to dynamically create a workbookSpec property file (XML) and then load it in Excel node. However, I am getting following error while trying to do the same: "Error initializing the Excel Reader: Exception attempting to validate specification Validated against XSD: ExcelFileOutputSpec.xsd At Line 1, Column 1. Contained Exception: Content is not allowed in prolog."

    Can you suggest an approach to dynamically load files with differing sheet index and data start rows.

     



  • 2.  RE: Load excel sheets using parameters

    Employee
    Posted 08-03-2021 04:13

    The WorkbookSpec property of the Excel File node must be a literal value, you cannot source it from a file or an input field.

    However, as the property value can be sourced from a Run property, you could use a the Execute Data Flow node to acquire the file in a 'child' data flow and pass the value for the populated WorkbookSpec property into the child data flow as a run property.

    The main data flow would be similar to this example:

    The 'Driver Data' Create Data node represents your logic that derives the data that needs to be passed into the child data flow by the Execute Data Flow node. It comprises the name of the Excel file to be imported, the sheet index in the workbook, the header row number, the data start row number and the name of the temporary file to be used to store the imported data. e.g.

    The Execute Data Flow node is configured with the Resource Path for the child data flow and to pass through all input fields to it's output

     

    The child data flow performs the acquisition of the data from the specified Excel File. An example data flow is shown below:

    A Generate Data node is used to obtain the name of the file to be acquired from the 'SourceFile' Run property.

     

    The Excel File node is configured to source the input file name from the 'FileName' input field:

     

    The node's 'WorkbookSpec' property is configured to use textual substitutions from Run property values to specify the sheet index, the header row and data start row values. In this example it is also configured to generate output fields for the filename and the record number in the file.

     

    The Output BRD File node is configured to save the contents of the Excel sheet in the specified temporary file (again with the value derived from the Run properties).

     

    In the Main data flow, the results from the successful executions of the Execute Data Flow node are loaded from the temporary files using a BRD File node, and the data are concatenated to form the complete data set.

    When all of the data from the temporary files has been read, the run dependency from the BRD File node enables the Transform node to run. The Transform node is configured to delete the temporary files that were created by the successful executions of the child data flow.

     

    The example main and child data flows are attached. Some properties/values will need to be changed to match your environment, e.g. the 'DataFlow' property of the Execute Data Flow node and the Driver Data.

     

     

    Attached files

    Load_Excel_Main_DF - 3 Aug 2021.lna
    Acquire_Excel_File - 3 Aug 2021.lna