Data360 Analyze

 View Only
  • 1.  Schedule specific temp files cleaning rule

    Employee
    Posted 08-11-2021 04:30

    Hi,

    For a specific schedule, is it possible to specify a different period to keep temp data, instead of using the general one defined in the global settings?

    the use case is when you have a specific worflow that use a lot more data than all the other workflows, you may want to clean the temp data every day, instead of every 5 days for instance for the rest of the schedules.

    Thank you

    Sebastien



  • 2.  RE: Schedule specific temp files cleaning rule

    Employee
    Posted 08-11-2021 06:29

    There is no built-in mechanism to automatically delete the temporary data associated with the execution of a particular data flow.

    If you are running the data flow manually (i.e. not under a schedule) then you can delete the run data for a data flow from the properties panel in the Analyze Directory view:

     

    The temporary data that is generated when a data flow is executed is stored in a sub-directory of the system's '<dataDir>/executions' directory, e.g. data-7731/executions.

    The execution data for a data flow is stored in the data-7731/executions/<username>/<data flow name> directory. 

    You could consider building a data flow that used a Directory list node to recursively retrieve the list of execution data files associated with the data flow by setting the Directory List node's DirectoryName property to

     {{%ls.brain.tempDir%}}/<username>/<Data+flow_name>

    e.g.

    [Note the use of the '+' character to represent a space character in the name].

    Your data flow could then (using a Transform node with a suitable Python script) filter the files to only include those older than your desired retention period and the files could be deleted, e.g.

     

    #### ConfigureFields Script

    import os

    out1.result = unicode

    #Configure all fields from input 'in1' to be mapped
    #to the corresponding fields on the output 'out1'
    out1 += in1

     

    #### ProcessRecords Script



    currentDate = datetime.datetime.strptime("{{^CurrentDate^}}","%Y-%m-%d") # get today’s date as type DATE
    daysAgo = currentDate - datetime.timedelta(days=30) # Get the date 30 days ago.

    if in1.Modified <= daysAgo:
        out1 += in1
        try:
            os.remove(in1.FileName)
        except:
            True

     

    Note. Do not remove any contents of the 'cache'  directory that is in the 'executions' directory.



  • 3.  RE: Schedule specific temp files cleaning rule

    Employee
    Posted 12-01-2021 08:41

    Thank you <x-zendesk-user data-user-name="Adrian Williams">366746295667</x-zendesk-user>.

    It is working and can be used, thanks! 

    Sebastien