Data360 Analyze

 View Only
  • 1.  Node Limit on Server

    Posted 03-18-2025 00:53

    Is there a way for the admin of a Data360 server to limit the number of nodes that are allowed to be active at any time? In Lavastorm we were able to limit the number of parallel nodes at a dataflow level and I have a memory of being able to limit them on the server overall by changing a prop file.

    Is this possible? My server has some periods during the day where there are very high counts of nodes running at the same time and it is causing significant CPU spikes.

    Cheers.



    ------------------------------
    Toby Harkin
    TELSTRA CORPORATION LIMITED
    Sydney NSW
    ------------------------------


  • 2.  RE: Node Limit on Server

    Employee
    Posted 03-18-2025 13:01

    You can limit the number of nodes in a data flow that can be run at the same time but there is no overall limit on the number of nodes at the server level.

    The thread limit configuration information is here:

    https://help.precisely.com/r/t/1010707214/2023-12-31/Data360-Analyze/analyze-server/Latest/en-US/Data360-Analyze-Server-Help/Thread-limit-configuration

    In the case of nodes that run In-Container, there is a separate control, see the documentation on In-Container node execution:

    https://help.precisely.com/r/t/1010707228/2023-12-31/Data360-Analyze/analyze-server/Latest/en-US/Data360-Analyze-Server-Help/In-container-node-execution

    It will be necessary for you to investigate the number of simultaneous scheduled runs that are being processed at times of peak load and potentially move the timings of the scheduled runs to spread the aggregate load.



    ------------------------------
    Adrian Williams
    Precisely Software Inc.
    ------------------------------



  • 3.  RE: Node Limit on Server

    Posted 03-28-2025 18:41

    Thanks for responding Adrian although a bit disapointing. I already track node runs through out the day and we have periods of 90 + nodes being run at the same time. Any changes to total node count in a dataflow won't really solve anything as it will either just consume more in a moment or spread our schedules out even more. I'll keep looking for other alternatives to give me the control that I need.

    Cheers.



    ------------------------------
    Toby Harkin
    TELSTRA CORPORATION LIMITED
    Sydney NSW
    ------------------------------



  • 4.  RE: Node Limit on Server

    Posted 04-01-2025 07:23

    Hi, couldn't it be an option to increase "JvmMaxHeapSize " that is default 2048mb in "<Data360Analyze site configuration directory>/conf/cust.prop"
    that affects Java-based nodes.

    As i see it, you can utilize more of the Server's RAM, configuring this. And this could elevate the strain on the CPU?
    I have tested this and it seems to run smoother on the server.

    I did a trial and error with the HeapSize amount, in my case i had a lot of RAM that wasn't utilized before i changed it.



    Best regards



    ------------------------------
    Henrik B
    E.ON Sverige
    ------------------------------



  • 5.  RE: Node Limit on Server

    Posted 04-01-2025 07:33

    Also operations like bigger joins,sorts , aggregations could be done in your source database to elevate the Data 360 server computation?

    Best regards



    ------------------------------
    Henrik B
    E.ON Sverige
    ------------------------------



  • 6.  RE: Node Limit on Server

    Employee
    Posted 04-01-2025 10:20
    Edited by Adrian Williams 04-02-2025 04:38

    If you have legacy data flows that leverage deprecated nodes such as the (C++ based) nodes - AggEx(Deprecated), Sort(Deprecated), Filter(Deprecated), Join(Deprecated), etc that use legacy LAE BRAINScript you may want to consider refactoring the nodes to use the more recent (Java-based) versions of these nodes (Aggregate, Sort, Transform, Join) as these nodes can run 'in-container'. Nodes that run 'in-container' share a common java process and benefit from having a lower start-up overhead compared with nodes that run in their own process space. In addition to starting faster, they can also consume less system resources. See the Help documentation

    https://help.precisely.com/r/t/1010707228/2023-12-31/Data360-Analyze/analyze-server/Latest/en-US/Data360-Analyze-Server-Help/In-container-node-execution

    You may also want to investigate whether introducing Run Dependencies between nodes within your data flows may reduce the number of independent node chains that are running in parallel within the data flow. See the Help documentation

    https://help.precisely.com/r/t/1010707260/2024-10-14/Data360-Analyze/analyze-server/Latest/en-US/Data360-Analyze-Server-Help/Create-run-dependencies

    Increasing the number of CPU cores may also improve performance as more threads can be run on the system before the overall efficiency of the system starts to decrease due to excessive context switching, as the scheduler attempts to allocate resources between the different running threads. For optimum performance, a maximum of four nodes should be running on the system per provisioned CPU core.

    I would caution against increasing the system-wide default Java Heap Space that is allocated to all Java nodes. This can cause the overall resource requirements to increase (since each node is being allocated a bigger heap space regardless of whether it needs it). This can also lead to undesirable side-effects such as increased incidence of Java Heap Out of Memory errors. The JVM needs to allocate a contiguous block of memory when the node starts - if the requested memory block is bigger, there is a higher likelihood that a suitable block cannot be allocated and the node will fail.

    Note, the above Java Heap Space setting is different to the space allocated to the Web Application (Tomcat)

    If a specific node requires more Java heap space to run, increase the allocated resources just for that node.

    See the Help documentation:

    https://help.precisely.com/r/t/1010707229/2025-02-12/Data360-Analyze/analyze-server/Latest/en-US/Data360-Analyze-Server-Help/Java-heap-space

    'Pushing down' work to external systems (i.e. RDBMSs) can reduce the load on the server hosting Data360 Analyze and may be a good way to increase overall efficiency as the RDBMS can optimize it's queries (but this can just move the bottle-neck elsewhere). You should also review your data flows to minimize the number of fields that are being carried though the data flow (as 'payload' rather than as fields actively being used within the data flow). In some situations where the data set is very wide, it may be more productive to allocate a unique reference to the data record, remove unwanted fields from the data set being processed and then, re-join the 'payload'  fields onto the results records as a final step. 



    ------------------------------
    Adrian Williams
    Precisely Software Inc.
    ------------------------------