Identifying when a Control Break occurs in a pipeline

Have you ever needed to identify when a control break occurs in your incoming data?
Having this ability can save you considerable processing resources in cases where your incoming data is sorted on last names and you only need to perform calculations or lookups when a change occurs on last name.

Here is the pipeline i put together to demonstrate how to identify control breaks in your incoming data.

the first snap generates the following input data:

Using a sort snap i then sort the incoming records by lastname

In order to identify when a control break occurs, we need to compare one record to another and to do this we need to join one record to it’s previous record. To accomplish this we will use one of the snap functions accessible in as an expression in a mapper snap.
Here you can see where i am moving snap.in.totalCount out as $SnapInTotalCount.

Next i use a copy snap to make two copies of the incoming data which i will then join back together using a join snap.

I now need to join the current record from one input stream to the previous record from the other input stream and the following trick below is how i accomplish this. I subtract one from the left path’s $SnapInTotalCount and join it to right path’s $SnapInTotalCount.
*note i subtracted 0 to keep the data formats aligned.

i now have record 1 on the same line as record 0, record 2 on the same line as record 1 and so on.
Now all i have to do to set my flag is compare LastName to input1_LastName.

Using a condition snap and the following conditional expressions, i am able to set my control break flag ($CD_FLAG) to either T or N.

Below are the final results and as you can see CB_FLAG is set to ‘Y’ anytime there is a change in LastName.
I can now use CB_FLAG downstream in my pipeline in a router or filter snap to avoid unneeded processing.

Control Break Pipeline Example_2017_02_28.slp (12.4 KB)