Data360 Analyze

 View Only
  • 1.  Writing to Excel Template

    Posted 03-25-2022 09:58

    Hi

    We are using the Output Excel Node using the TemplateInputFile Parameter to write our Excel output to a template. The template has 57 columns some of which are "grouped" together. There is no other data in the file. 

    To write a few million records across a couple of these files is taking 2 hours. Any suggestions on how to improve performance? It should take a few mins. I saw in the help it mentions about not making the template contain too much data as the node could run out of memory but there is no data in the file except for column headings. Is there a limit on memory available to any one node?

    Thanks

    Scott



  • 2.  RE: Writing to Excel Template

    Employee
    Posted 03-31-2022 02:22

    I did some tests to look at the relative performance of writing the same data set with and without the use of a template. The number of columns was similar to your use case (55) but the number of records was reduced for the tests to 10K as the execution time for 1M records was also over 2 hours for me when using a template file.

    The use case where a template file was not configured was used to set a baseline when writing 10K records.

    When a template file was configured for the node, the execution time was approximately 12 times longer compared with the no template use case. The node's Optional properties were set at their defaults.

    In the next use case the FormatOutput property was set to use the cell format from the template file. In this case the node execution was approximately 2.7 times longer than the no template use case.

    In a further use case the PreserveCellStyles property was set to False. In this case the cell values are output in a format to match the input data type but the existing cell styles are not maintained. The execution time for this use case was comparable with the no template use case.

    So looking at these results, setting the node to not preserve the cell styles from the template file produced the best performance. However, depending on your template this may or may not be practical. If you can construct your template file to have a 'Raw_Data' worksheet you could then populate this worksheet with the data from Analyze without having to be concerned with maintaining any cell styles. Other worksheets in the workbook could then reference the values in the Raw_Data worksheet to produce the final formatted report.

    You may want to investigate the options yourself (using a moderate data set) to see what works best for you. 

    Note, the Excel specifications say the maximum number of records in a .xlsx is just over 1M.