The transform node is designed to work in a loop already, so simply having one transform node with os.remove() added will delete all of the row separated filepath that you feed into the transform.
Knowing that the transform node loops once for each row of data that you feed into it, your example above looks like it will work but it doesn't necessarily make sense. For each file that you want to delete (each record input to your transform node), you are creating another list of all files in your temp folder and doing a loop through each of those with your delete - instead of deleting one file your loop is deleting everything in the folder for each file. For example if you had 5 files to delete in your dataflow, you are telling Data360 to create a list of the files in temp and loop through each of those files and delete all of them, 5 times over.
------------------------------
Alex Day
Knowledge Community Shared Account
------------------------------
Original Message:
Sent: 01-31-2023 04:10
From: Henrik B
Subject: auto deleting file by
I believe the above script will only remove 1 file, i don't think its an iteration.
I am doing the same in a graph, reading from a directory, keeping the latest file based on created date(new files are created each day). And removing the rest each day.
For filtering
For filtering those files you want to keep or remove, you can compare today() from created date from directory node?
Then use a transform to make two outputs:
out1 = files you keep (Example, if Δ(Created date, today) < something
else:
out 2 = files you want to remove, that later goes into the data flow described below
after filtering/conditions is done for what to look for:
(i am only sending files i want to remove in this flow)
1st transform node:
Configure fields:
out1 += in1
import shutil
import os
ProcessRecords:
import shutil
import os
out1 += in1
shutil.move(in1.Filename, in1.Temp)
# it moves your in files you want to delete to temporary directory named "Temp"
2nd transform node:
Processrecords:
import os
import glob
files= glob.glob('filepath\Temp\*')
for f in files:
os.remove(f)
# this loop over "Temp" directory and empties that directory
------------------------------
Henrik B
E.ON Sverige
Original Message:
Sent: 01-27-2023 04:59
From: grzegorz mazur
Subject: auto deleting file by
Hi Alex,
Thank you for your replay
meanwhile, yesterday I had a conversation with Akshita, thank you both for your help.
I have tried directory list node + path to folder but the outcome is: directory cannot be found
the directory cannot be found even though I copy/paste path as below so there is no typo
I am wondering out loud if I should pass to path on server site?
Best,
Grzesiek
Original Message:
Sent: 01-26-2023 14:46
From: Alex Day
Subject: auto deleting file by
Read the directory using Directory List node
Using the date columns from the output of the Directory List node (created or modified date), filter and split out the files that are older than X
Then you can link that to a Transform node, and in your Transform you can use this script to loop through and delete the files:
Configure Fields:
import os
Process Records:
os.remove(in1.FileName)
It should loop through each file and delete it one by one
------------------------------
Alex Day
Knowledge Community Shared Account
Original Message:
Sent: 01-24-2023 03:21
From: grzegorz mazur
Subject: auto deleting file by
Hi.
I will appreciate any advice in terms of deleting files from overloaded path. To be honest we have a path with over 57k files and number still growing.
Fetching list of files from UI takes around 10-15 min and deleting manually files in that folder is uncomfortable and time consuming, the app freezes and is very slow response once we come to that path. For example deleting 1k files is more or less impossible. Our goal is get an option from UI by a new data flow deleting files automatically by scheduler where file is older than X. Is there any option to get around it, different than deleting folder from UI?
Thank you in advance for your advice.
Best,
Grzesiek