My pipeline is a simple pipeline that execute an Oracle query then parse the data to a CSV file. The result of the query gives some of the duplicate records (Same ID), that because the database has those duplicates. However, I just want to keep only 1 record of each duplicate rows. How do I do that?
Hi, unique snap eliminates duplicate documents in a document stream, such as duplicate rows.
use unique snap before mapper and let me know if you still have any issues.
My goal is to keep 1 record from each duplicate pair like this
So using Unique snap, will it remove the entire duplicate pair?
it removes duplicates and retains one record from the duplicate rows.
I see that but the thing is the unique snap only remove duplicate when both rows are exact the same.
In my case, I only have column ID is duplicated. The other columns are not the same.
okay, then you can use group by snap to group by ID and map required fields. attaching the sample pipeline and the data.
readCsv_2021_01_19.slp (7.6 KB)
@mtran21 Try use Duplicate snap and use column Id as Field .