Skip to Content
Feb 03, 2020 at 06:31 PM

CPI - Filter duplicate rows from payload

1086 Views Last edit Feb 03, 2020 at 06:34 PM 2 rev


I'm building an iflow in Cloud Platform Integration (CPI) that queries SuccessFactors OData API for information. The requirements include having a location "city" associated with a "pay grade". However, our organization isn't set up this way - pay grades aren't unique to a city, so a pay grade can be at multiple cities and each city can have multiple pay grades.

To get around this issue, I'm first querying for employees, including their location city and pay grade. This way I have a set of data that has a city associated with a pay grade. My issue is that this results in duplicates, since employees with the same pay grade often work in the same city.

I am looking for the solution to this issue: how can I filter out duplicate rows in the payload.

I'm mapping the data to one level so that I can convert from XML to CSV. I can either leave out the userID from this mapping so that there is no key, and the duplicate rows will be true duplicates, or I can include the userID so that the userID is the key for each record and the rest of the record can be duplicated. If I did it the second way and included the userID, I would have to use another mapping after the duplicates are removed so that the userIDs are not mapped. The XML structure is attached showing both cases (PFTarget_Without_UserID.xsd and PFTarget_With_UserID.xsd).

I've attached 2 sample payloads - one with the current output, one with the desired output. Please note that the first record (userId 123) has the same information as the second record (userId 456). In the desired output, the second record (userId 456) has been removed from the payload. This is what I'm attempting to accomplish. If I went the route of not including the userId, the first and second records would be true duplicates with no key (userId) to differentiate them, and one would need to be removed. I'm stating this as an option in case that one or the other (with userId or without) is easier to accomplish the desired output.