Forum Discussion

mtran21's avatar
mtran21
New Contributor III
5 years ago

How do you keep only one record of duplicated records?

My pipeline is a simple pipeline that execute an Oracle query then parse the data to a CSV file. The result of the query gives some of the duplicate records (Same ID), that because the database has those duplicates. However, I just want to keep only 1 record of each duplicate rows. How do I do that?

6 Replies

  • Hi, unique snap eliminates duplicate documents in a document stream, such as duplicate rows.
    use unique snap before mapper and let me know if you still have any issues.

    • mtran21's avatar
      mtran21
      New Contributor III

      My goal is to keep 1 record from each duplicate pair like this

      So using Unique snap, will it remove the entire duplicate pair?

  • it removes duplicates and retains one record from the duplicate rows.

    • mtran21's avatar
      mtran21
      New Contributor III

      I see that but the thing is the unique snap only remove duplicate when both rows are exact the same.
      In my case, I only have column ID is duplicated. The other columns are not the same.