Lookup node & performance

View Only

Back to discussions

Expand all | Collapse all

1. Lookup node & performance

0 Like
Henrik B
Posted 09-05-2023 02:13

Reply Reply Privately
What would be the consequences of running millions of rows and a big amount of columns for lookup nodes in 360 often?

In the documentation it states that "The Lookup node is recommended for use with a small data set on the right input as the entire right data set is loaded into memory"

My point is, can this cause sluggish performance for Data 360 and the machine it's installed on?

As a note, i have over 100 Gb of physical memory installed.

------------------------------
Henrik B
E.ON Sverige
------------------------------
2. RE: Lookup node & performance

0 Like
Employee

Adrian Williams
Posted 09-05-2023 07:07
Edited by Adrian Williams 09-06-2023 09:11

Reply Reply Privately
The documentation for the Lookup node states:

"The Lookup node works in a similar way to the Join and Merge nodes and is an optimization of the inner and left joins. The Lookup node is recommended for use with a small data set on the right input as the entire right data set is loaded into memory. An advantage of using the Lookup node is that the input data sets do not need to be sorted prior to using the node. If you want to join two large data sets, you should use the Merge or Join node."

The choice of which correlation node to use depends on the specifics of your scenario. The Lookup node is intended for use in the situation where the number of records on the Lookup (Right) input pin is relatively small compared with the number of records on the Data (Left) input pin. All of the records on the Lookup pin are loaded into memory. If you have sufficient memory available on the machine you still need to ensure that you have provided the node with sufficient Java Heap Space to store the maximum number of records in the Lookup data set. You may need to specify the value of the JvmMaxHeapSize property (which is hidden by default) if you encounter Java Heap Space errors when running the node.

As the Lookup node is an optimization of the inner and left joins used by the Join and Merge nodes, the Lookup node will usually perform better than the Join and Merge nodes in situations where it can be used as it does not have to repeatedly buffer a portion of the Right data set into memory as it is processing the Left Data records (since all Right records are being stored in memory). It also does not have to pre-sort the data sets, which also increases its performance.

As with all correlation nodes, having a narrower set of fields in the data sets being joined may improve the overall efficiency (since the data for unused fields in the data set does not have to be marshalled into and out of the node during processing). If the expected fraction of records that will match is low (i.e. you are effectively filtering out the majority of input records since they do not match) and the data set is wide, it may be better to uniquely identify the records in the data set and remove unnecessary fields prior to the join operation; perform the join, and then re-join the discarded fields back into the results set using the unique id.

------------------------------
Adrian Williams
Precisely Software Inc.
------------------------------

Original Message
3. RE: Lookup node & performance

0 Like
Employee

Ernest Jones
Posted 09-06-2023 08:52

Reply Reply Privately
My experience has been that the lookup node aborts if the right input is too big. About 10 years ago I would have the trouble when approaching a million records on the right input, but half a million records was ok.

I did not analyze that with respect to the amount of memory available for that node. Now with much larger memory sizes, the limit before the abort may be larger. You might you increase the memory on the node as Adrian said. Of course don't increase the memory on the node so much that it causes issues with the operating system itself.

------------------------------
Ernest Jones
Precisely Software Inc.
PEARL RIVER NY
------------------------------

Original Message

Data360 Analyze

Lookup node & performance

Henrik B09-05-2023 02:13

Adrian Williams09-05-2023 07:07

Ernest Jones09-06-2023 08:52

1. Lookup node & performance

2. RE: Lookup node & performance

3. RE: Lookup node & performance

About Precisely

Customer Support

Copyright ©2024 Precisely. All rights reserved worldwide.