Efficient method of splitting datasets

View Only

Back to discussions

Expand all | Collapse all

Efficient method of splitting datasets

1. Efficient method of splitting datasets

0 Like
Gail Sinclair
Posted 09-21-2023 05:34

Reply Reply Privately
We have some large datasets that we want to send to DQ+. We get a memory error with the Publish to DQ+ node due to the size and so we plan to split the data into many smaller datasets and load those.
I've used the hash node to do this which seems to work quite well - however our performance testers are concerned that this uses up to 75% CPU.
Can anyone suggest a more CPU-efficient method of splitting a dataset into multiple subsets?

Many thanks

------------------------------
Gail Sinclair
Hargreaves Lansdown PLC
------------------------------
2. RE: Efficient method of splitting datasets

0 Like
Employee

Ernest Jones
Posted 09-22-2023 09:25

Reply Reply Privately
I don't know the performance requirements you have, nor the performance expectations of the Analyze nodes. I can think of other ways to split up the data that probably run more slowly.

You could add outputs to a Transform node and use the Python modulus operator on the value of node.execcount and the number of outputs. From that, determine which output pin.

Maybe instead you are wanting to limit the number of records that are processed, for that you need the Looping node. You could make it extract a chunk of maybe 100,000 records for upload. Then on the next iteration the next 100,000 for upload, etc.

------------------------------
Ernest Jones
Precisely Software Inc.
PEARL RIVER NY
------------------------------

Original Message
3. RE: Efficient method of splitting datasets

0 Like
Gail Sinclair
Posted 09-22-2023 09:40

Reply Reply Privately
Thanks, Ernest

The Analyze engineering team have advised that this CPU spike is not a concern as Analyze will use as much as is available if nothing else is running. If other process are running then the node will use less but take longer.

I will keep your suggestions in mind though.

------------------------------
Gail Sinclair
Hargreaves Lansdown PLC
------------------------------

Original Message

Data360 Analyze

Efficient method of splitting datasets

Gail Sinclair09-21-2023 05:34

Ernest Jones09-22-2023 09:25

Gail Sinclair09-22-2023 09:40

1. Efficient method of splitting datasets

2. RE: Efficient method of splitting datasets

3. RE: Efficient method of splitting datasets

About Precisely

Customer Support

Copyright ©2024 Precisely. All rights reserved worldwide.