Skip to Content

How to randomly select records in Data Services

I have a flat file containing X records. I want to randomly select Y records from this file to a separate flat file. Is there a simple way to do this in Data Services? Note, this should be a true random selection, not every "nth" record where n = x / y.

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

1 Answer

  • Oct 28, 2017 at 10:18 AM

    2 Query transforms:

    1. Add a column RANDOM_NUMBER of type real or double, map it to rand() -- built-in function that generates random number between 0 and 1
    2. Add a where-clause: e.g. RANDOM_NUMBER < 0.2 -- will select 20% of your records
    Add comment
    10|10000 characters needed characters exceeded

    • You're right and you're not.

      Random values are by nature distributed evenly throughout a population. The bigger the input size, the more exact the output.

      Unfortunately, the DS rand() function does not seem to generate real random numbers. I wasn't aware of that issue. Use rand_ext (without a seed) instead. That will do the trick.