Data360 Analyze

 View Only
  • 1.  JDBC write performance to Azure Synapse Analytics (MPP database)

    Posted 09-06-2021 00:38

    We are having performance issues with the JDBC store node. We are writing to an Azure Synapse Analytics database (relational big data store) from a VM with 32GB Ram, 4 CPU vCores. Network wise, all date goes via the Microsoft network backbone (which usually is not a bottleneck).

    - Reading in the table (220K records, 1 attribute) takes about 4 seconds

    - Writing that same data to an existing table in Synapse takes +- 3 hours. We performed several trials with configurations of LoadMethod, BatchMode and CommitFrequency properties inside the node. The only thing that had a noticeble improvement was adding a "BulkInsert" property directly inside the connection string. That reduced the writing time to about +- 5 minutes (which still feels relatively long).

    - Writing the same data to a BRD file is again in the order of seconds.

    Could you give us some pointers on how to tackle the performance?

    Thanks in advance!

     

     

     



  • 2.  RE: JDBC write performance to Azure Synapse Analytics (MPP database)

    Employee
    Posted 09-07-2021 09:33

    Inserting data into Azure Synapse database from an external source can be a relatively slow process. 

    Adding the EnableBulkLoad=true option in the connection string manually enables the bulk insert functionality that bypasses the data parsing that is normally done when data is inserted into the database, which results in the performance gain you experienced in your tests.

    You can also leverage the bulk insert functionality with the JDBC Execute by using parameterized inserts e.g. INSERT INTO MYTABLE (Cust_ID, Cust_First_Name, Cust_Last_Name) VALUES (?, ?, ?)

    In this case the SqlQueryFieldBindings property would specify the parameterized input fields to be inserted into the table.

     

    To optimize the loading speed you should mimimize the number of load jobs running concurrently. 

    You may be able to increase the loading speed by loading your data into a staging table and as a second step move the data from the staging table into the data warehouse table. The staging table should be defined to use use round-robin distribution rather than hash distribution.

    If you do not need to encrypt your data you could consider disabling ColumnEncryption, which can improve the overall performance of the data load (and query). Anecdotal comments also indicate the type of encryption used can also impact load performance. Inserts may also be slowed down by database triggers.

    As an alternative to loading large data sets directly, some online discussions suggest overall performance may be improved by first loading data into Azure blob storage and then copy the data into Azure Synapse Analytics.