How To Remove Duplicates In SQL

In this article, we will explain the full process of performing SQL activity for duplicates specifically on rows, from a table. Really, we need to follow the specific best methods while designing and developing objects in the SQL server.

To give you an example, a table will have identity columns, primary keys, constraints to ensure data integrity, and also clustered indexes.

However, even when following these best practices to a T, we may face issues such as duplicate rows! And we may need to remove these duplicate rows before they go through the inserting process in the actual production tables themselves. 

How to Remove Duplicates in SQL

How do These Tables Work?

Okay, so, your SQL table does contain duplicate rows, and you want to remove these rows yourself, right? We all face these issues on numerous occasions, and it can feel like an absolute nightmare to clean up. It is best to use the relevant keys to eliminate the occurrence of duplicates as much as possible for a starter.

We need to follow specific guidelines to clean up these duplicate data rows. Creating a simple table and sorting a few records in will usually be what you use this type of software for.  

To remove duplicate records from a SQL Server table, we need to use the DISTINCT keyword. The first method uses the SELECT statement to retrieve the unique values from the table.

We then compare each value retrieved from the query to every other record in the table. If the value matches any other row, the row is removed from the result set.

This process repeats until there are no more duplicates in the result set. The second method uses the DELETE statement to remove the duplicated rows from the table. In this case, we use an IN clause to specify the list of values that should be removed from the table.

Method One

Duplicate records are moved from the original table to the duplicate table. All the duplicates are deleted from the original table. Then, the rows in the duplicate tables are moved back to the original table, then you need to drop the duplicate table.

This is a simple method. You must have enough free space in the database to create the new table. You must move the data into the new table before restoring the data back to the original table. This method also causes extra work because you are moving the information.

If your table has an identity column, you should set Identity Insert On when you restore the data back to the old table.

Method Two

Delete all duplicate rows. Because of the expression, the scripts don’t sort the partitioned data by any condition. If your criteria to delete duplicate records require choosing which records to delete based on the sorting order, you could use the `ORDER BY` expression to do this.

Method 2 is simple and effective because it doesn’t need temporary copies or joins. It also doesn’t require additional indexes or complicated queries.

This method works well in most situations, but if your version of SQL Server doesn’t support the ROW_Number() function, then you should use another method.

Delete Duplicate Rows Using Other Methods

Group By Method 

In this method, we use the GROUP BY clause to group the data based on the specified column name, and we can use the count() function to check the occurrence. So if we want to identify the duplicates, we can use the following method.

Use the COUNT function to check exactly where the occurrences have happened in a row. 

Common Table Expressions Method

The result set shows the first name, last name, country, and row number. Note that there are two rows with the same values for First Name, Last Name, but different RowNumber. This happens because we used the ORDER BY clause to order by the RowNumber.

So, if you want to get rid of duplicates, you need to sort your results by the RowNumber before removing duplicates.

Using the Rank Function

We can use the ROW_NUMBER() function to eliminate duplicate records. SQL ROW_NUMBERS gives a unique record number for each record, irrespective of the duplicate records. In this example, we use a ROW_NUMBER function by clause.

This helps us to prepare a number of data for specific columns and provides a ranking for that particular partition.

Using SSIS package

A SQL server service helps you to reduce exact manual effort, specifically and to optimize the task. You can also eliminate the duplicate rows from the database.

Using Sorting Operator

We can use a sorting operator in the server to sort the values in the table. When we preview the data, we can see that there are duplicate values in the source table. Adding a Sort operator removes these duplicates.

To configure the sort operator, double-tap on it and click the column names that house duplicate values. We can see the table after the execution of the SQL command. As you can see, the data is being sorted by the key field. The output is going to be stored in two different tables based on the value of the key column.

I Have Tried Using The DISTINCT Keyword, But It Didn’t Work

You can use GROUP BY and HAVING clauses, like this:

DELETE FROM tablename WHERE ID NOT IN (SELECT MIN(id) FROM tablename);

SELECT * FROM tablename GROUP BY col1,col2;

If you are not sure whether there are any duplicates or not, then you should use the COUNT() function instead of MIN().

Delete Duplicate Rows From A Table Example

There are many duplicates in the table, and we want to remove them. We use a common table expression to do this. A common table expression is used when there is more than one query that needs to be executed on the same data set.

Duplicate rows have been removed. If you query data from tables again, you will find no duplicate rows.

Final Thoughts

We hope that this article can give you some clarity on the issue of removing duplicates in SQL. The process is actually rather simple when understood well and executed well enough, so don’t be afraid to try out all the methods above to find the right method for you and your problem at hand.

Albert Niall
Latest posts by Albert Niall (see all)