Data360 Analyze

 View Only
  • 1.  schedule same data flow in parallel with different runtime properties

    Posted 06-14-2021 07:02

    Hi all

    Please can you confirm that it is possible to schedule multiple instances of the same dataflow to run at the same time? Each individual schedule would have some differing runtime parameters (e.g. specifically pick up data sets from different file locations). 

    Any known restrictions/issues with doing this?

    thanks

    Scott

    https://support.infogix.com/hc/en-us/articles/360051016734-Data360-Analyze-How-to-create-and-use-run-properties-



  • 2.  RE: schedule same data flow in parallel with different runtime properties

    Employee
    Posted 06-15-2021 02:14

    Yes, you can specify multiple schedules to run an instance of the same data flow, at the same time. 

    Each scheduled run retains it's own state information and will be listed in the 'Runs' page.

    You should ensure your data flow does not create files with a static filename else the different runs could overwrite data or fail to obtain a write lock on the file.

    You should consider the overall impact on system resources of running multiple data flows simultaneously, especially if multiple nodes are running in parallel within each data flow (this is a general issue and also applies to running different data flows at the same time).

    When accessing external systems, you should consider whether the systems permit concurrent accesses by the same user profile. 



  • 3.  RE: schedule same data flow in parallel with different runtime properties

    Posted 02-23-2022 08:49

    Hi Adrian

    Is there a unique runtime D360 variable that I can use that is unique to a particular scheduled instance of the same graph?  I thought of the "Run Time" parameter but that wouldn't be unique if two instances of the same graph were scheduled at the same time. 

    Kind regards

    Scott



  • 4.  RE: schedule same data flow in parallel with different runtime properties

    Posted 02-23-2022 09:18

    Hi Scott,

    Have you explored the "Execute DataFlow" node and considered passing the variables and an ID into the flows from there. You could use a BRD file to count each run as a basic counter and then use that as a way to ID which flow is which.

    The execute Dataflow node also provides a really nice set of outputs too.

    1) DataFlowLocator: [object:!tenant:defaultTenant~workspace:715a0625-dc0c-454b-b03e-9a3c198ca857~directory:002218ad-42ab-4eae-afe6-1bf039551fba~graph:080ad60b-6b04-48d5-8467-8246bd3837a3] a link to the actual run data for the individual flows.
    2) RunStatus
    3) Duration

    That way to can control the IDs and even make them up yourself



  • 5.  RE: schedule same data flow in parallel with different runtime properties

    Posted 02-23-2022 09:30

    Hi John

    thanks for the prompt reply. Yes i have used the "Execute DataFlow" for other purposes and it's very nice. 

    The issue i have that i am trying to solve here is that I want to create a unique temporary folder for storing some temp files per scheduled instance of  a graph but that unique folder has to be created at compile time rather than runtime so i need a unqiue system variable for that. Is there any inbuilt internal variables i can use like an "instance id"? {{^InstanceId^}}

    thanks

    Scott



  • 6.  RE: schedule same data flow in parallel with different runtime properties

    Posted 02-23-2022 10:05

    Maybe combined the execute data flow with some very basic python like:

    import uuid
    out1.xxx = str(uuid.uuid4())

    and xxx will be a random UUID (aka GUID) character string, which can be merged into a directory path using another line of python and passed to the data flow using the execute data flow.

    UUID can be calculated before run time in the python node and therefore its ready, unique and waiting for your flow to start.

    Would that work?

    see (https://en.wikipedia.org/wiki/Universally_unique_identifier)



  • 7.  RE: schedule same data flow in parallel with different runtime properties

    Posted 02-23-2022 10:13

    thanks John. Yeah that would work it's just I need a way of doing it without using the "Execute DataFlow" node.  I guess there must be some internal Id for a run instance. For example when i look at the run data for a scheduled instance that is running and/or completed, the name I click on to open the instance to see the run results has the graph name with a number appended at the end of it that looks like an internal id for the instance. Is that available in the graph as a variable like {{^InstanceId^}}? 



  • 8.  RE: schedule same data flow in parallel with different runtime properties

    Posted 02-23-2022 10:20

    I'm not sure if you can reference them in the scheduler.

    Why have you got to do it without the Execute Dataflow node? Can't you just do the very basics in there to get a UUID directory path and call the same dataflow with one variable. and run the your scheduled run against that flow rather than the original.