Group By Field behaving mysteriously

I have a Union snap with 4 inputs where the incoming documents contain the same columns. To deduplicate the rows, I am preparing the data by using the Group by Field snap to identify those with a shared column value.

When validating the pipeline this doesn’t seem to be processing correctly as I can see the same value grouped more than once. E.g. if I have 15 incoming documents and it duplicates 2 groups, then I have 17 documents in the output.

Group by Field Snap

Hi @NAl,

Use the Sort snap on the same field you are grouping by. It is always a good practice to sort before using Group By, Unique, Join etc.

2 Likes

There is also the deduplicate snap if that helps.