Skip to Content
avatar image
Former Member

which is first Enrichment or De-duplication?

Hi Friends,

Which is we need to handle first for data quality, Enrichment or De-duplication?

Thank you

Shankar

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

5 Answers

  • avatar image
    Former Member
    Jul 09, 2008 at 07:36 AM

    Hi Shankar,

    First we have to Enrich the Data.

    After enriching the data only, we will be able to run de-duplication with precision. otherwise the de-duplication will be running on non-quality data and wont be that effective.

    Hope this clarifies,

    + An

    Add comment
    10|10000 characters needed characters exceeded

  • avatar image
    Former Member
    Jul 09, 2008 at 10:56 AM

    Hi Shankar,

    Which process shd we take first ,depends upon various factors like No of records,Charge of the webservices to enrich data like D&B etc..

    Server capabilitiesetc.

    It is right to enrich data first but if data is to large then to enrich data willl be of great cost and then to maintain that also will be a tedious task.So i think first de-duplicate the data then go foe enrichment job.

    Brad

    Add comment
    10|10000 characters needed characters exceeded

  • avatar image
    Former Member
    Jul 09, 2008 at 11:56 AM

    Hi all,

    Enrichment is a misleading word because some times it includes also address validation while others refer to it as to "add information which was not there originally".

    Hence if 3 different DQ functionalities are considered for CDI data (address validation, enrichment and de-duplication): address validation should be performed first, it will improve the quality of both de-duplication and enrichment (D&B for example). The enrichment will be next because it will also improve the de-duplication quality, and last will be the de-duplication.

    If costs of enrichment are to be considered then you should verify if you pay per request or per successful response. If the charge is per request, then data should be as accurate as possible beforehand to reduce payment of failing to enrich bad data.

    Edna

    Add comment
    10|10000 characters needed characters exceeded

  • avatar image
    Former Member
    Dec 10, 2012 at 12:57 PM

    The steps should be as follows:

    1) Eliminate obvious duplicates by performing initial screening of the data within the systems

    2) Enrich the data (like D&B)

    3) Identify duplicates across the systems

    4) Eliminate duplicates

    Add comment
    10|10000 characters needed characters exceeded

  • Dec 13, 2012 at 08:17 AM

    HI Shankar,

    Adding to the useful inputs from above,I think before any deduplication one has to have a data study which could tell you what is the shape of data,duplicacy,fill rate,fill rate of important attributes which are needed in Deduplication procedure etc.Such DQ report will come in handy in strategising way ahead.

    If enrichment can be done in-house it should preceed the deduplication process so that it gives best results.If enrichment is a paid service one can go for phased deduplication after enrichment.

    Thanks,

    Ravi

    Add comment
    10|10000 characters needed characters exceeded