If you have legacy data flows that leverage deprecated nodes such as the (C++ based) nodes - AggEx(Deprecated), Sort(Deprecated), Filter(Deprecated), Join(Deprecated), etc that use legacy LAE BRAINScript you may want to consider refactoring the nodes to use the more recent (Java-based) versions of these nodes (Aggregate, Sort, Transform, Join) as these nodes can run 'in-container'. Nodes that run 'in-container' share a common java process and benefit from having a lower start-up overhead compared with nodes that run in their own process space. In addition to starting faster, they can also consume less system resources. See the Help documentation
https://help.precisely.com/r/t/1010707228/2023-12-31/Data360-Analyze/analyze-server/Latest/en-US/Data360-Analyze-Server-Help/In-container-node-execution
You may also want to investigate whether introducing Run Dependencies between nodes within your data flows may reduce the number of independent node chains that are running in parallel within the data flow. See the Help documentation
https://help.precisely.com/r/t/1010707260/2024-10-14/Data360-Analyze/analyze-server/Latest/en-US/Data360-Analyze-Server-Help/Create-run-dependencies
Increasing the number of CPU cores may also improve performance as more threads can be run on the system before the overall efficiency of the system starts to decrease due to excessive context switching, as the scheduler attempts to allocate resources between the different running threads. For optimum performance, a maximum of four nodes should be running on the system per provisioned CPU core.
I would caution against increasing the system-wide default Java Heap Space that is allocated to all Java nodes. This can cause the overall resource requirements to increase (since each node is being allocated a bigger heap space regardless of whether it needs it). This can also lead to undesirable side-effects such as increased incidence of Java Heap Out of Memory errors. The JVM needs to allocate a contiguous block of memory when the node starts - if the requested memory block is bigger, there is a higher likelihood that a suitable block cannot be allocated and the node will fail.
Note, the above Java Heap Space setting is different to the space allocated to the Web Application (Tomcat)
If a specific node requires more Java heap space to run, increase the allocated resources just for that node.
See the Help documentation:
https://help.precisely.com/r/t/1010707229/2025-02-12/Data360-Analyze/analyze-server/Latest/en-US/Data360-Analyze-Server-Help/Java-heap-space
'Pushing down' work to external systems (i.e. RDBMSs) can reduce the load on the server hosting Data360 Analyze and may be a good way to increase overall efficiency as the RDBMS can optimize it's queries (but this can just move the bottle-neck elsewhere). You should also review your data flows to minimize the number of fields that are being carried though the data flow (as 'payload' rather than as fields actively being used within the data flow). In some situations where the data set is very wide, it may be more productive to allocate a unique reference to the data record, remove unwanted fields from the data set being processed and then, re-join the 'payload' fields onto the results records as a final step.
------------------------------
Adrian Williams
Precisely Software Inc.
------------------------------
Original Message:
Sent: 03-28-2025 18:40
From: Toby Harkin
Subject: Node Limit on Server
Thanks for responding Adrian although a bit disapointing. I already track node runs through out the day and we have periods of 90 + nodes being run at the same time. Any changes to total node count in a dataflow won't really solve anything as it will either just consume more in a moment or spread our schedules out even more. I'll keep looking for other alternatives to give me the control that I need.
Cheers.
------------------------------
Toby Harkin
TELSTRA CORPORATION LIMITED
Sydney NSW
Original Message:
Sent: 03-18-2025 13:00
From: Adrian Williams
Subject: Node Limit on Server
You can limit the number of nodes in a data flow that can be run at the same time but there is no overall limit on the number of nodes at the server level.
The thread limit configuration information is here:
https://help.precisely.com/r/t/1010707214/2023-12-31/Data360-Analyze/analyze-server/Latest/en-US/Data360-Analyze-Server-Help/Thread-limit-configuration
In the case of nodes that run In-Container, there is a separate control, see the documentation on In-Container node execution:
https://help.precisely.com/r/t/1010707228/2023-12-31/Data360-Analyze/analyze-server/Latest/en-US/Data360-Analyze-Server-Help/In-container-node-execution
It will be necessary for you to investigate the number of simultaneous scheduled runs that are being processed at times of peak load and potentially move the timings of the scheduled runs to spread the aggregate load.
------------------------------
Adrian Williams
Precisely Software Inc.
Original Message:
Sent: 03-18-2025 00:52
From: Toby Harkin
Subject: Node Limit on Server
Is there a way for the admin of a Data360 server to limit the number of nodes that are allowed to be active at any time? In Lavastorm we were able to limit the number of parallel nodes at a dataflow level and I have a memory of being able to limit them on the server overall by changing a prop file.
Is this possible? My server has some periods during the day where there are very high counts of nodes running at the same time and it is causing significant CPU spikes.
Cheers.
------------------------------
Toby Harkin
TELSTRA CORPORATION LIMITED
Sydney NSW
------------------------------