25 June 2012

Deleting Duplicate Rows in Sql server


SQL Server tables should never contain duplicate rows, if there is a key constraints. Table with no key constraints allow the duplicates. It happens in many ways such as import data from other resources.

Nothing to worry, write a query and make your database without duplicates.

First we need a table and some data to explain the process. Use the following script to create a table and insert a record.

CREATE TABLE Dup_Students (ID INT, FirstName varchar(25), Department Char(2))
CREATE TABLE Dup_Students (ID INT, FirstName varchar(25), Department Char(2)) INSERT INTO Dup_Students VALUES(1, 'Jack', 'IT')INSERT INTO Dup_Students VALUES(2, 'Alice', 'ME')INSERT INTO Dup_Students VALUES(3, 'James', 'EE')INSERT INTO Dup_Students VALUES(4, 'Nickle', 'CE')INSERT INTO Dup_Students VALUES(5, 'George', 'IT')--INSERT INTO Dup_Students VALUES(1, 'Jack', 'IT')INSERT INTO Dup_Students VALUES(2, 'Alice', 'ME')INSERT INTO Dup_Students VALUES(3, 'James', 'EE')INSERT INTO Dup_Students VALUES(4, 'Nickle', 'CE')
--INSERT INTO Dup_Students VALUES(1, 'Jack', 'IT')INSERT INTO Dup_Students VALUES(2, 'Alice', 'ME')
Now you can view the number of duplicate records in each row.
SELECT ID, FirstName, Department, Count(*) as DuplicateCount From Dup_Studentsgroup by ID,FirstName, Department

Before deleting you can view the records with row number for the duplicates. This result set is going to act as a temp table for the delete process.
SELECT ID, FirstName, Department, 
ROW_NUMBER() OVER(PARTITION BY ID, FirstName ORDER BY ID) AS DuplicateRowCount
FROM Dup_Students
We are going to delete all the duplicates using CTE (Common table expression) and ROW_NUMBER(), which is a new in SQL server 2005.
WITH Dup_Students_CTE (ID, FirstName, Department, DuplicateRowCount)AS(SELECT ID, FirstName, Department, 
ROW_NUMBER() OVER(PARTITION BY ID, FirstName ORDER BY ID) AS DuplicateRowCount
FROM Dup_Students)DELETE FROM Dup_Students_CTEWHERE DuplicateRowCount > 1GO