cancel
Showing results for 
Search instead for 
Did you mean: 

Best approach for transformation

former_member263733
Participant
0 Kudos

We have a single flat file.

900 distinct groups with multiple members.

The data is written by user ID.

ID, Group

ID, Group

....

We need to transform this into a format like this:

Group, "UserID1, UserID2...UserIDn"

Basically, we need to concatenate the user id's into a single comma separated string quoted at the beginning and end of the string

We were thinking about some sort of pivot but each group could have a variable number of ID's.

Furthermore we are also wondering about the limits of varchar,  6000 user ID's with a comma, each 5 to 12 characters would be a pretty long string.

We would be willing to break up the list and have multiple records for a group each with a subset of user ID's (say two records for the group with 500 user ID's instead of one record with 1000 ID's)

There are 900 distinct groups, and can have anywhere from 1 to 6000+ members.

Our thinking now might be that we create a query transform of all records Group Name, User ID (sorted by Group Name).


Then using a script we might invoke sql to"walk" the table and concate each ID into a string until we encounter a change in group, then write out that string for the group ID - and continue this process until the end of the table. 

This seems a little awkward and we are wondering if there is a better approach or some built-in BODS features that might make this easier

Any ideas or suggestions would be appreciated,

Thanks,

Jeff Henke

Accepted Solutions (0)

Answers (2)

Answers (2)

Former Member
0 Kudos
former_member187605
Active Contributor
0 Kudos

This is possible with a DS custom function, if your data would reside in a database. But once you have the data is in a database, there is a much more effective approach by using a database-level function. That will avoid thousands of roundtrips from DS to database.

Build a data flow to copy your file into a database. Make sure you add a column with a sequence number, use the built-in gen_row_num_by_group function to do so.

Write the database function with 2 input parameters (group and sequence number) to concatenate (up to) 500 userid's, and import it into DS.

Build a 2nd data flow, use the table as a source. Add a Query transform that reads records #1, #501, #1001... from the table (use where-clause mod(seq_number,500) = 1) and calls the function. Write the output to a file or another databse table.