by Damiaan Peeters
21. September 2011 09:29
Somehow I got duplicate relations in my Umbraco Relations table (about 120 k). That’s impossible to clean them up manually, so I decided to remove the duplicates using an SQL statement directly on the database.
Apparently deleting duplicates is not so hard on newer SQL servers when using Common Table Expressions. CTE’s are a kind of temporary result sets that can be used within a single statement. The difference with a temporary table is that it is not stored as an object.
WITH Dublicates_CTE(parentId, childId, Id)
AS
(
SELECT parentId, childId , Min(Id) Id
from umbracoRelation
group by parentId, childId
having COUNT(*) >1
)
DELETE FROM umbracoRelation
WHERE Id IN (
SELECT umbracoRelation.Id
FROM umbracoRelation
INNER JOIN Dublicates_CTE
ON umbracoRelation.parentId= Dublicates_CTE.parentId
AND umbracoRelation.childId= Dublicates_CTE.childId
AND umbracoRelation.id <> Dublicates_CTE.Id
)
I found the idea of removing duplicates on this blogpost.