Count number of records fetched/processed from a flat file/upstream system systems (snowflake, salesforce, oracle)/file writer without using a pipeline execute

Hi Team,

I’m looking to count records on a couple of scenarios listed below:

(1) Records fetched from a flat file (e.g. excel, csv) and writing the total counts of records into a new column
e.g. File Reader --> Mapper (transformation rules here with new column added to count the total number of records) --> Excel/CSV formatter --> File Writer

I’ve tried using snap.in.totalCount and snap.outputViews inside a mapper but didn’t get the expected results.

(2) Records fetched from source system like snowflake, salesforce, oracle, etc. without using a count command in the query itself

I’m thinking of using a Group By or an Aggregate snap to get the counts, would that be the right approach?

(3) Counting number of records processed after the operation has been completed. For instance, I’m writing a flat file (excel/csv) but want a new column ingested into that file dynamically that states the total number of docs processed AND send an email to the team that states total number of docs processed.

e.g. File Reader/Salesforce Read --> Mapper --> excel/csv formatter --> File Writer --> Mapper (anticipating this should have some rules) --> Email Sender (sends count ONLY)

Thanking you in advance for your time and help on this one.

Best Regards,
Darsh

Hey @darshthakkar,

You have couple of options here:

  1. Group by/Gate, and a maper after with $group.length function
  2. Aggregate snap
  3. snap.in.totalCount combined with Tail snap

Third one is a bit abstract and I’m not a big fan of, but I listed it as an option anyway.

Hope this helps,
BR

1 Like

Thanks @bojanvelevski, all of them worked.
My personal favorite is suggestion 1 as 2 has an overhead to the performance of the pipeline.

Closing this thread now.