Data360 Analyze

View Only

Back to discussions

Expand all | Collapse all

Run Data Validation rules in D3S Analyse

1. Run Data Validation rules in D3S Analyse

Like
Scott PERRY
Posted 04-28-2021 09:44

Reply Reply Privately
Hi,

I have a requirement to run some standard DQ validation rules in a generic way on any input file based off of a config file per input file. Please can you advise the best way to do that in D3S Analyse?

Example Use Case below:

Inputs:

InputData.csv - File containing data to run data validation rules on. Example data:

TradeId, LEI, TradeDate, Notional, BuySellFlag

001, ,22-01-2021, 100, B

002,ABC,29-04-2021, , X

TestCasesForInputData.csv

TestNumber, TestFieldName, TestType, TestTypeValue

TEST_01,Notional,NotNull, n/a

TEST_02,LEI,NotNull, n/a

TEST_03,TradeDate,DateGreaterThan, 28-04-2021

TEST_04,CheckInEnum,“B, S”

Then for each row in the Input file, the code will process the row and apply all relevant tests from file TestCasesForInputData.csv to all relevant fields e.g. set TestFieldNameValue to LEI and run TEST_02 which would set error = Y

If TestType = NotNull

Then

if (TestFieldNameValue.isNull() and TestFieldNameValue.strlen() == 0)

then

                error = "Y"

else

                error = "N"

If TestType = DateGreaterThan

then

#convert to date first?

if TestFieldNameValue > TestTypeValue

then

                error = "Y"

else

                error = "N"

If TestType = CheckInEnum

if TestFieldNameValue NOT IN TestTypeValue

then

                error = "Y"

else

                error = "N"

#for all tests emit the errors

emit TestNumber, TestFieldName, TestType, TestTypeValue, TestFieldNameValue

where error == "Y"

thanks!
2. RE: Run Data Validation rules in D3S Analyse

Like
Employee

Adrian Williams
Posted 04-29-2021 09:47

Reply Reply Privately
Here is one way that mirrors your example:

The lower Create Data node represents the imported data from your InputData.csv file and the upper Create Data node is equivalent to the imported data from the InputData.csv file.

A new field is added to both data sets to indicate the type of record: 'Test' or 'Data' and then the records are concatenated with the 'Test' records at the top of the combined data set.

A transform node then extracts the details from the 'Test' records and puts the details in a list.

Then the set of tests are performed on each 'Data' record. If a test fails the node outputs a record with the details of the failed test and the original input data.

See the attached example data flow (requires Analyze v.3.6.x).

Attached files

DQ_Validation - 29 Apr 2021.lna
3. RE: Run Data Validation rules in D3S Analyse

Like
Scott PERRY
Posted 04-30-2021 09:59

Reply Reply Privately
Hi Adrian, this is excellent thank you! much appreciated.

I have a related query.. it seems very similar use case to the above but as I am very new to D3S so would appreciate your help again.

Before applying the TestCasesForInputData.csv file I first need to filter my input file using a similar config file approach. e.g. a config file called FilterConfig.csv containing rules for when to include records from InputData.csv. For example the below rules would keep the two records from the InputData.csv example file above.

FilterNumber, FilterFieldName, FilterType, FilterTypeValue

Filter_01,TradeId,NotNull, n/a

Filter_02,BuySellFlag, CheckInEnum,“B, S, X”

Filter_03,TradeDate,DateGreaterThan, 20-01-2021

Please could you provide a similar example D3S dataflow?

Much appreciated

Scott
4. RE: Run Data Validation rules in D3S Analyse

Like
Employee

Adrian Williams
Posted 05-01-2021 01:59

Reply Reply Privately
I'm not sure I entirely understand the details of what you are trying to achieve, but it can probably be implemented using a Join node that matches the required fields in the Filter data with the corresponding fields in the Test data:

The Join node is configured with the default JoinMode (Inner) and the required MatchKeys:

The node's FieldList property is configured to exclude from the output all fields in the Left data set. In this case the fields were excluded en-bloc (click on the green '+' button to set the include/exclude mode), but you could have individually un-checked each field.

In this example the *TypeValue fields were used as matches. However, as the values in your Test data and Filter Data do not match these tests would not have been output by the Join node. In the attached data flow I tweaked the filter data so the tests would be output but you may decide to only use the *Number and *FieldName fields in your Join MatchKeys criteria. The remainder of the data flow is as before.

Please note that the example is only intended as a proof of concept and there is no validation of the environmentals for the tests being performed, e.g. whether the data type of the selected field is appropriate for the tests being performed, etc. The values for the test criteria and the input data (derived from the .csv files) are all assumed to be strings and the logic that performs the tests makes that assumption.

Attached files

DQ_Validation_w_Filter - 1 May 2021.lna
5. RE: Run Data Validation rules in D3S Analyse

Like
Scott PERRY
Posted 05-25-2021 08:58

Reply Reply Privately
Hi Adrian

Thanks for this filter example.

Going back to the test cases, I am trying to add a "script" test case where the test config file can contain a script. So in your "Perform DQ Validation Tests" node i added the following code which works:

elif test[2] == 'Script':
if eval("""in1[test[1]] is not Null and len(in1[test[1]]) > 0"""):
pass
else:
out1 += in1
out1.TestNumber = test[0]
out1.TestFieldName = test[1]
out1.TestType = test[2]
out1.TestTypeValue = test[3]
node.write(0, out1)

but when I replace it with the following code it does not work. There is no error, it just does not evaluate.

elif test[2] == 'Script':
if eval(test[3]):
pass
else:
out1 += in1
out1.TestNumber = test[0]
out1.TestFieldName = test[1]
out1.TestType = test[2]
out1.TestTypeValue = test[3]
node.write(0, out1)

test[3] is the TestTypeValue from your test cases input data "TestCasesForInputData" (which i switched to a csv delimted input file to allow for escape of double quotes). For some reason it does not evaluate the string from TestTypeValue i.e test[3].

TestNumber TestFieldName TestType TestTypeValue

TEST_05 LEI Script """in1[test[1]] is not Null and len(in1[test[1]]) > 0"""

Please can you advise how to get this to work?

much appreciated!

Scott
6. RE: Run Data Validation rules in D3S Analyse

Like
Employee

Adrian Williams
Posted 05-26-2021 05:09

Reply Reply Privately
After a lot of head scratching it seems to be a quirk of the eval() function when presented with a string that would normally be considered a multi-line comment - i.e. text surrounded by three sets of double quotes. If the string value for the script only has a single set of double quotes in the Create Data node (i.e. there are no quote characters in the string when the value is viewed in the data viewer) then it works!

Attached files

DQ_Validation_w_Script - 26 May 2021.lna
7. RE: Run Data Validation rules in D3S Analyse

Like
Scott PERRY
Posted 05-26-2021 06:38

Reply Reply Privately
Thanks Adrian you're a legend! I've been down lots of rabbit holes trying to fix this so I'm glad it turned out to be doable. Thanks again! :-)
8. RE: Run Data Validation rules in D3S Analyse

Like
Scott PERRY
Posted 05-26-2021 10:11

Reply Reply Privately
Hi Adrian

Apols, one more hopefully small question from a D3S newbie. I am trying to assign the output field name for my field name called 'LEI'. Out1.LEI works but how do i pull the field name from my test input data so its dynamic based on the test being executed. I tried out1.test[1] but that doesn't work. Please can you advise?

out1 += in1
out1.LEI = eval(test[5]) - This works
out1.test[1] = eval(test[5]) This doesn't work

Thanks in advance

Scott
9. RE: Run Data Validation rules in D3S Analyse

Like
Employee

Adrian Williams
Posted 05-27-2021 02:16

Reply Reply Privately
Unfortunately that will not be possible. You can dynamically name a field using a property value (this could be a property on the node itself, a parent Composite node, a Data Flow property or a Run property), or based on something derived from the node's input metadata. However, you cannot define the name of an output field based on a value derived from something in an input data record. This is because the output metadata is determined when the Transform node's ConfigureFields script is run. This script runs when the node starts execution and this happens before the first record is read - so no values from the input record are available at this time.

Her is an example of using a property value:

A custom property has been defined on the node ('OutputFieldName'). The ConfigureFields script uses a textual property substitution to obtain the value set on the OutputFieldName property and assigns the value to the 'newFieldName' variable. If no value has been set on the property then the default value is used instead ("_DefaultName"). The out1[<string>] method is used to specify the name of the field and the value of the newFieldName variable will be used as the name of the output field. The newFieldName variable's value is also used in the ProcessRecords script when assigning a value to the required output field.

The following example shows how an aspect of the input metadata can be used to dynamically set an output field name:

The ConfigureFields script (starting line 10) checks whether there is a field called 'color' in the input metadata. If it is present the 'dynamicName' variable is set to the specifed value and used to create the new output field. If there was no 'color' field in the input metadata a different value is assigned to the dynamicName variable and again this is used to create the new output field. The dynamicName variable is available when the ProcessRecords script so it can be used when assigning a value to the output field when processing records (as described in the previous example).

Please see here for further information on the Transform node:

https://support.infogix.com/hc/en-us/articles/360018962334-Using-the-Python-based-Transform-Node

And the Python scripting > API and script bindings topic in your Help documentation, or the online documentation for the latest version of the product, here:

https://d3sa-preview.infogixsaas.com/docs/dist/help/Content/f-script/python/python-api.htm

You can use the Execute Data Flow node to pass values calculated in one data flow into a 'child' data flow. The values that are passed in to the child data flow are part of the state of the child data flow and are treated in a similar way to other Run properties. See the 'Running with data driven properties' section of the Execute Data Flow node's Help

https://d3sa-preview.infogixsaas.com/docs/dist/help/Default.htm#e-node-help/Interfaces_and_Adapters/execute-data-flow.htm
10. RE: Run Data Validation rules in D3S Analyse

Like
Scott PERRY
Posted 05-27-2021 03:04

Reply Reply Privately
Hi Adrian

Thanks for the reply. Thankfully my output fields name list will always exist beforehand so the fields will always be on the input and therefore output file so I don't need to create a new fieldname dynamically (apologies I should have been more clear on that). I had a play around just now and seem to have a solution using the exec function as follows. Please let me know if there is a better way of doing it?

out1.LEI = '123ABC' - This works fine (obviously)

out1.test[1] = '123ABC' - As we saw, this doesn't work even though test[1] ="LEI".

Using exec function:

tmp = "out1." + test[1] + " = " + "123ABC"

exec (tmp) - this seems to work fine.

Kind regards

Scott
11. RE: Run Data Validation rules in D3S Analyse

Like
Scott PERRY
Posted 05-28-2021 01:18

Reply Reply Privately
Hi Adrian

Just expanding out the types of tests I will be able to execute from my test configuration file to now include an if else script. I can get the If statement to work directly in an exec statement in the code but not an If, else statement.

For example

This works:

exec("if in1[test[1]] is not Null and len(in1[test[1]]) > 0:Tmp = 'BUYBUY' ")

This doesn't work:

exec("if in1[test[1]] is not Null and len(in1[test[1]]) > 0:Tmp = 'BUYBUY' else: Tmp = 'SELLSELL' ")

Please can you help me out with how to do that?

Thanks in advance

Scott
12. RE: Run Data Validation rules in D3S Analyse

Like
Employee

Adrian Williams
Posted 05-28-2021 02:28

Reply Reply Privately
You may want to try restructuring the if else statement as a tertiary operator assignment:

https://www.webucator.com/how-to/how-do-ternary-operator-assignment-python.cfm
13. RE: Run Data Validation rules in D3S Analyse

Like
Scott PERRY
Posted 05-28-2021 02:36

Reply Reply Privately
Sounds good thanks, I'll give it a try.
14. RE: Run Data Validation rules in D3S Analyse

Like
Scott PERRY
Posted 05-28-2021 08:43

Reply Reply Privately
Hey Adrian, just to let you know that tertiary operator assignment worked. thanks again!
15. RE: Run Data Validation rules in D3S Analyse

Like
Scott PERRY
Posted 06-02-2021 04:22

Reply Reply Privately
Hi Adrian

Hope you had a nice long weekend!

Another query for you...after the "Perform DQ Validation Tests" node I need to split the output into multiple outputs at runtime, split by TestFieldName. Any suggestions? The split would need to be dynamic at runtime, i.e. not hardcoding the test field names in a split node.

I should also add that once the split is done by TestFieldName i need to be able to filter each split output using a config file (which i can see is possible by feeding into the change metadata node)

thanks in advance!

Scott

ps i just found this reference from some years ago whereby it seems a solution was made but I cant open the example file perhaps as its an old brg file from 6 years ago.

https://support.infogix.com/hc/en-us/community/posts/360051778573-Dynamic-Outputs-Creating-multiple-dynamic-output-files-from-one-data-source-
16. RE: Run Data Validation rules in D3S Analyse

Like
Employee

Adrian Williams
Posted 06-02-2021 07:56

Reply Reply Privately
There is no means to have a truly dynamic split because you need to manually create the output pins of the node. If you know the number of unique tests (ie test name values) then you can create a Transform node with N + 1 outputs and steer records to one of the outputs based on the value of the TestFieldName in the input records:

If the value did not match the record would be steered to the final output and, the number of records counted in an Aggregate node:

If there were unrecognised records this could be used to generate an error:

Attached files

DQ_Validation_Results_Split - 2 Jun 2021.lna
17. RE: Run Data Validation rules in D3S Analyse

Like
Scott PERRY
Posted 06-02-2021 08:45

Reply Reply Privately
thanks. I also found this previous post.

https://support.infogix.com/hc/en-us/community/posts/360051778573-Dynamic-Outputs-Creating-multiple-dynamic-output-files-from-one-data-source-

There is .brg file at the link that has a "Node 2 (Output To Multiple Excel Files By Group)" that sounds like it does what I need but I cant open the brg file as don't have lavastorm. Was "output to multiple excel files by group" a lavastorm node that is no longer available?
18. RE: Run Data Validation rules in D3S Analyse

Like
Scott PERRY
Posted 06-02-2021 09:29

Reply Reply Privately
Hi Adrian

Just to let you know I was able to import the brg file but first had to open the file in a text editor and change the BRGVersion:5 to BRGVersion:6. Once update the "RootFileName" path location to my c drive it works! Just need to see if i can enhance it for my use case now..

Scott
19. RE: Run Data Validation rules in D3S Analyse

Like
Employee

Adrian Williams
Posted 06-03-2021 08:25

Reply Reply Privately
The brg file is for the legacy Lavastorm LAE product. As you note, only version 6 .brg files can be imported into Analyze. However version 5 files are not guaranteed to work when imported so your mileage may vary.

I'm not sure I can recommend it's use unless you are familiar with the Lavastorm product as the nodes use the legacy scripting language.

It wasn't clear from your earlier messages, but if your use case is to segment a data set so it can be written to multiple files you can achieve this in a simpler manner - provided the output file format is a csv/delimited file. The Output CSV/Delimited node can be configured to obtain the output file name from an input field by using the 'From field' source option on the Filename property:

See the attached example.

Attached files

Split_Data_Into_Multiple_CSV_Files - 3 Jun 2021.lna
20. RE: Run Data Validation rules in D3S Analyse

Like
Scott PERRY
Posted 07-08-2021 02:10

Reply Reply Privately
Hi Adrian

Just exploring this suggestion you made in the graph above for splitting data into multiple csv files. I can see this works well in your example but I have an additional requirement to also specify particular fields in each output file that I am able to do in the do-while-loop of the other graph using a set metadata node before creating the output file.

Is there anyway to use the filename(from field) approach but also drive the output fields per file based on data in the input file?

thanks

Scott
21. RE: Run Data Validation rules in D3S Analyse

Like
Employee

Adrian Williams
Posted 07-08-2021 03:28

Reply Reply Privately
Hi Scott,

No, you cannot subset the fields that are output in the csv files on a per-file basis, all of the fields will be output to each of the csv files.
22. RE: Run Data Validation rules in D3S Analyse

Like
Scott PERRY
Posted 07-08-2021 03:30

Reply Reply Privately
okay thanks

TestNumber	TestFieldName	TestType	TestTypeValue
TEST_05	LEI	Script	"""in1[test[1]] is not Null and len(in1[test[1]]) > 0"""

Data360 Analyze

Run Data Validation rules in D3S Analyse

Scott PERRY04-28-2021 09:44

Adrian Williams04-29-2021 09:47

Scott PERRY04-30-2021 09:59

Adrian Williams05-01-2021 01:59

Scott PERRY05-25-2021 08:58

Adrian Williams05-26-2021 05:09

Scott PERRY05-26-2021 06:38

Scott PERRY05-26-2021 10:11

Adrian Williams05-27-2021 02:16

Scott PERRY05-27-2021 03:04

Scott PERRY05-28-2021 01:18

Adrian Williams05-28-2021 02:28

Scott PERRY05-28-2021 02:36

Scott PERRY05-28-2021 08:43

Scott PERRY06-02-2021 04:22

Adrian Williams06-02-2021 07:56

Scott PERRY06-02-2021 08:45

Scott PERRY06-02-2021 09:29

Adrian Williams06-03-2021 08:25

Scott PERRY07-08-2021 02:10

Adrian Williams07-08-2021 03:28

Scott PERRY07-08-2021 03:30

1. Run Data Validation rules in D3S Analyse

2. RE: Run Data Validation rules in D3S Analyse

Attached files

3. RE: Run Data Validation rules in D3S Analyse

4. RE: Run Data Validation rules in D3S Analyse

Attached files

5. RE: Run Data Validation rules in D3S Analyse

6. RE: Run Data Validation rules in D3S Analyse

Attached files

7. RE: Run Data Validation rules in D3S Analyse

8. RE: Run Data Validation rules in D3S Analyse

9. RE: Run Data Validation rules in D3S Analyse

10. RE: Run Data Validation rules in D3S Analyse

11. RE: Run Data Validation rules in D3S Analyse

12. RE: Run Data Validation rules in D3S Analyse

13. RE: Run Data Validation rules in D3S Analyse

14. RE: Run Data Validation rules in D3S Analyse

15. RE: Run Data Validation rules in D3S Analyse

16. RE: Run Data Validation rules in D3S Analyse

Attached files

17. RE: Run Data Validation rules in D3S Analyse

18. RE: Run Data Validation rules in D3S Analyse

19. RE: Run Data Validation rules in D3S Analyse

Attached files

20. RE: Run Data Validation rules in D3S Analyse

21. RE: Run Data Validation rules in D3S Analyse

22. RE: Run Data Validation rules in D3S Analyse

About Precisely

Customer Support

Copyright ©2025 Precisely. All rights reserved worldwide.