Skip to Content
avatar image
Former Member

Finding Duplicate Records in the Realtime job

I have a requirement where the Input is in the form of XML file.It gets the data entered at Web-page.The fields are Venfor ID, Name, Street, House    number,Region,City,Postalcode.Here I need to check whether the particular vendor is already present in the target database and get the score for  level of matching.I have taken XML Message source as source and target.I am trying to use match transform but not sure how to use in this scenario.

Please suggest how to go ahead with this.

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

3 Answers

  • Jan 13, 2015 at 12:09 PM

    You can define a break group using the vendor number and assign match scores to the rest of the columns in the Match process. One thing to remember is that the contribution of each field in the matching process determines the accuracy of the de-duplication. If you are matching based on addresses, and if the vendor ID is a legacy vendor ID not the target vendor ID, then you can use the address matching.

    One example is below in the screenshot

    What is important to note is that all the data (target and source) that is relevant to the matching process is stored in a single dataset before you put it through the match process. Else, the output will not be accurate.

    kind regards

    Raghu

    Add comment
    10|10000 characters needed characters exceeded

  • avatar image
    Former Member
    Jan 13, 2015 at 01:01 PM

    There's video here for the batch job.You can use the same approach for the real time job as well

    Add comment
    10|10000 characters needed characters exceeded

  • avatar image
    Former Member
    Jan 13, 2015 at 01:35 PM

    Hi,

    Please refer the below screenshot where you can find the extracted XML source and Target namely "Request" and "Response" structures.

    If I am using Address_MatchBatch transform as below  ,

    Please let me know what all the things to map in the Input -Options- Output as per the transform.

    Am I using the right transform for the scenario ?

    If not please suggest as I am new to the DataQuality.

    Thanks,

    Parineeta

    Add comment
    10|10000 characters needed characters exceeded