Finding Duplicate Records Using GROUP BY in SQL Server

Original Post: http://www.tech-recipes.com/rx/49959/finding-duplicate-records-using-group-by-in-sql-server/

There are various times when we need to find duplicate records in SQL Server. It is possible to find duplicates using DISTINCT, ROW NUMBER as well as the GROUP BY approach.

Duplicate records can create problems sometimes when displaying reports or performing a Multiple Insert update. Finding duplicate records in a database needs further investigation. In some cases, duplicate records are positive, but it all depends on the data and the database design as well.

For example, if a customer has ordered the same product twice on the same date with the the same shipping and billing address, then this may result in a duplicate record.

Let us create a table Customer with First Name, Last Name, and Mobile Number fields.

CREATE TABLE CUSTOMER
(
FirstName VARCHAR(50),
LastName  VARCHAR(50),
MobileNo  VARCHAR(15)
);

INSERT INTO CUSTOMER VALUES ('Niraj','Yadav',989898);

INSERT INTO CUSTOMER VALUES ('Chetan','Gadodia',959595);
INSERT INTO CUSTOMER VALUES ('Chetan','Gadodia',959595);

INSERT INTO CUSTOMER VALUES ('Atul','Kokam',42424242);
INSERT INTO CUSTOMER VALUES ('Atul','Kokam',42424242);

INSERT INTO CUSTOMER VALUES ('Vishal','Parte',9394453);
INSERT INTO CUSTOMER VALUES ('Vishal','Parte',9394453);
INSERT INTO CUSTOMER VALUES ('Vishal','Parte',9394453);

INSERT INTO CUSTOMER VALUES ('Jinendra','Jain',121212);
INSERT INTO CUSTOMER VALUES ('Jinendra','Jain',121212);

SELECT * FROM CUSTOMER;

Finding Duplicates Records Using Group by in SQL Server (2)

Using the DISTINCT approach, we can quickly get unique rows in a table.

SELECT DISTINCT FirstName, LastName, MobileNo FROM CUSTOMER;

Finding Duplicates Records Using Group by in SQL Server (3)

However, this does not show how many times a row has been duplicated. Using the GROUP BY approach, we can find this.

Finding Duplicates Using GROUP BY

Adding grouping and a counting field to our display of FirstName, LastName and MobileNo combination shows how many times each customer’s name appears.

SELECT		FirstName, LastName, MobileNo, COUNT(1) as CNT
FROM		CUSTOMER
GROUP BY	FirstName, LastName, MobileNo;

Finding Duplicates Records Using Group by in SQL Server (4)

GROUP BY will show just one record for each combination of FirstName, LastName and MobileNo.

The count CNT shows how many times the row has been duplicated.
CNT = 1 indicates that row appears only once.

Let us filter out using the Having clause to exclude rows that appear only once.

SELECT		FirstName, LastName, MobileNo, COUNT(1) as CNT
FROM		CUSTOMER
GROUP BY	FirstName, LastName, MobileNo
HAVING		COUNT(1) > 1;

Finding Duplicates Records Using Group by in SQL Server (1)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s