01-19-2021 09:40 AM
My pipeline is a simple pipeline that execute an Oracle query then parse the data to a CSV file. The result of the query gives some of the duplicate records (Same ID), that because the database has those duplicates. However, I just want to keep only 1 record of each duplicate rows. How do I do that?
01-19-2021 10:36 AM
Hi, unique snap eliminates duplicate documents in a document stream, such as duplicate rows.
use unique snap before mapper and let me know if you still have any issues.
01-19-2021 10:40 AM
My goal is to keep 1 record from each duplicate pair like this
So using Unique snap, will it remove the entire duplicate pair?
01-19-2021 10:45 AM
it removes duplicates and retains one record from the duplicate rows.
01-19-2021 10:47 AM
I see that but the thing is the unique snap only remove duplicate when both rows are exact the same.
In my case, I only have column ID is duplicated. The other columns are not the same.