Microsoft SSIS and Sql Server Notes: integration services

Showing posts with label integration services. Show all posts

Saturday, March 23, 2013

SSIS: Remove duplicate rows from input file

We have an requirement in which there are 40,000,000 (forty million) records in the Input file. We have to load it into the databases after implementing a few business logic transformations. Major part of concern is to remove duplicate rows. Following are the possible solutions.

1. Use Sort task and set "remove duplicate" property to TRUE. Would take quite long as it is a full blocking transaction.

2. Use Script component. Need to compare. I guess it would again be a full blocking transformation.

3.Dump data to DB and then reload after selecting distinct records. Looks like the best option.

Will be back with the stats. Feel free to add your suggestions/stats/comments.

Happy SSISing.. :)