Data360 Analyze

 View Only
  • 1.  Random sample of 10

    Employee
    Posted 11-28-2019 02:27

    Hi,

    I want to select a random sample of 10 records.
    Depending on the input data, the input for this node might not always be 10.
    Any ideas for a work around?



  • 2.  RE: Random sample of 10

    Employee
    Posted 11-28-2019 03:44

    If there are less than 10 records on the input, what to you want to output - all of the records? What is expected to happen if there are zero records on the input?

     



  • 3.  RE: Random sample of 10

    Employee
    Posted 11-28-2019 03:49

    I want to output the input and if there are more than 10 input records I want to show a random sample of 10.
    In short:

    0 -> 0

    7 -> 7

    1888 -> 10



  • 4.  RE: Random sample of 10

    Employee
    Posted 11-28-2019 05:17

    Here is one solution. Use an Aggregate node to count the total number of records which is output as the 'record_Count' field. Input the count data to a Cat node on its first input pin. Your data set is input on the Cat node's second input pin. The Cat node is configured to generate the union of the fields:

     

    The combined data set is then input to a Transform node. The Transform node leverages the Python sample() function. The number of records is obtained from the first record and is used to create a list of the indexes for all the data records (starting at 2). The sample() function is used to generate the list of 10 unique indexes to be output. If there are greater than 10 records in the data the records with the matching indexes are output. If there are 10 or less data records then all records are output.

    The corresponding example data flow is attached (requires Analyze v.3.5.1).

    The Head node is just used in the example as a means to control the number of data records, to allow the testing of the operation of the remaining nodes.

     

    Attached files

    Sample_When_Sufficient_Records - 28 Nov 2019.lna

     



  • 5.  RE: Random sample of 10

    Employee
    Posted 12-06-2019 06:37

    Thank you, the proposed solutions works as desired.
    The code in the transform node was especially helpful.