on 12-13-2017 1:20 PM
Hi Guys,
I have two duplicate records like lenovo company, lenovo . how to find this duplicate records????
Hello Bharathkumar,
If you can manage correct data or expected data and its variations in a seperate lookup table for data cleansing, then you can use the method I'll demonstrate here.
But please note that this approach is very straightforward and doesnot contain fuzzy logic
I assume you have two tables; one for transactional data and the other is for data cleansing
create table DataTable (
id int,
company nvarchar(100)
)
create table DataMaster (
company nvarchar(100),
variation nvarchar(100)
)
And I populate some sample data for testing the approach
insert into DataMaster select 'Lenovo','Lenovo Company' from dummy;
insert into DataTable select 1, 'IBM' from dummy;
insert into DataTable select 1, 'Lenovo' from dummy;
insert into DataTable select 1, 'Lenovo Company' from dummy;
As you can guess now, I'll execute an UPDATE statement that will match company field from transactional data from its variations, and use the correct form of the value
update d
set
company = m.company
from DataTable as d
inner join DataMaster as m
on d.company = m.variation;
I hope this helps you for the solution.
Unfortunately if your company data is not stored alone on a table column, if it is part of a longer string value, then it is more difficult to cleanse such data.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
91 | |
10 | |
10 | |
9 | |
9 | |
7 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.