Skip to Content

How to randomly select records in Data Services

I have a flat file containing X records. I want to randomly select Y records from this file to a separate flat file. Is there a simple way to do this in Data Services? Note, this should be a true random selection, not every "nth" record where n = x / y.

Add a comment
10|10000 characters needed characters exceeded

Assigned Tags

Related questions

1 Answer

  • Posted on Oct 28, 2017 at 10:18 AM

    2 Query transforms:

    1. Add a column RANDOM_NUMBER of type real or double, map it to rand() -- built-in function that generates random number between 0 and 1
    2. Add a where-clause: e.g. RANDOM_NUMBER < 0.2 -- will select 20% of your records
    Add a comment
    10|10000 characters needed characters exceeded

    • You're right and you're not.

      Random values are by nature distributed evenly throughout a population. The bigger the input size, the more exact the output.

      Unfortunately, the DS rand() function does not seem to generate real random numbers. I wasn't aware of that issue. Use rand_ext (without a seed) instead. That will do the trick.

Before answering

You should only submit an answer when you are proposing a solution to the poster's problem. If you want the poster to clarify the question or provide more information, please leave a comment instead, requesting additional details. When answering, please include specifics, such as step-by-step instructions, context for the solution, and links to useful resources. Also, please make sure that you answer complies with our Rules of Engagement.
You must be Logged in to submit an answer.

Up to 10 attachments (including images) can be used with a maximum of 1.0 MB each and 10.5 MB total.