Data360 Analyze

 View Only
  • 1.  Batch insert query optimization

    Posted 03-22-2024 06:39

    Hello, i want to use the dbexecute and not the db store node to do a batch insert of rows.

    So assume the following scenario:

    In this case i only have two rows, i want to make into one single insert statement as per below:

    This method is very costly, if i want to insert a big number of rows, especially the part in the transform node where i try to concatenate all incoming rows into one single insert statement.
    One work around is to save it as an excel file, then read it in an input raw node, but it would be nice to avoid this and have a simpler solution.

    Is there a way to make this work, if i want to combine a big amount of input rows into one single row statement as per above?

    Best regards

    Henrik Borg



    ------------------------------
    Henrik B
    E.ON Sverige
    ------------------------------


  • 2.  RE: Batch insert query optimization

    Employee
    Posted 03-25-2024 10:06

    Hi Henrik,

    I am unconvinced that the JDBC Store node is not the better option, but I'll answer.  I have not tested to see if this approach to creating that SQL insert statement is faster.

    I often run across the need to build a single string value using a concatenated lest of values from multiple records like you are describing, I've started structuring it like this:
    In ConfigureFields I define an empty Python List object:

    In the ProcessRecords property I append() all the values to this list, and then at the end I have Python create the string from the values in the list.  Python has a very nice syntax for doing this using the join() method.

    Back to your question, I am wondering if there is some sort of limit on the number of entries that are reasonable to put together in a list.  Maybe breaking it up into 1000 record insert statements would be better for the database to manage.

    And the reason I wonder if the JDBC Store node would not be better is I wonder if you've tried to use the Optional properties that can make it run faster.  Maybe the Commit Frequency needs to be increased to 10,000 or 100,000. 



    ------------------------------
    Ernest Jones
    Precisely Software Inc.
    PEARL RIVER NY
    ------------------------------



  • 3.  RE: Batch insert query optimization

    Posted 28 days ago

    Just to give my experience on loading to tables, I have found the JDBC Store to be generally quicker. I can tell you my experience with loading to Snowflake, I had an extremely slow loading table that took hours to load less than 100k rows. Other tables would load far more than that in minutes so I couldn't explain why this one table was so slow. I changed the CommitFrequency to 100,000 and that particular slow loading table loaded within minutes. It doesn't really make sense to me as I've tried it on other tables to play around with the CommitFrequency value and it can make the store node go slower in some cases.

    My take away was to play around with the CommitFrequency values to find which one is best for your case as I could not find a set rule on what value is actually best for Snowflake anyway.