Reading first N records from a file?

I have a file with a large number of rows. I want to be able to read the file, parse it, but stop reading the file after N rows have been parsed. Is this possible? I tried using a Head snap after the parser snap, but file reader continues to read the entire file and pipeline doesn’t complete until the entire file has been read; even though I’m only interested in the first N rows.

Exit snap can help, but that will mark the pipeline as failure in the Dashboard.

@PSAmmirata Try header snap.

I believe he did.

Simple answer: no.

In theory, snaps like the CSV Parser could be enhanced with a new setting to limit the number of output documents, but there’s little reason to burden snaps with this additional complexity when you can achieve the same result by adding a Head snap.

May I ask why it’s preferable to stop reading the file early?

In my pipeline I’m performing some analysis on those first N records and writing the analysis results to a file. The input file is very large and the output file containing the analysis results doesn’t appear to be closed until the pipeline completes (when the file reader finishes reading the file). The amount of time between the analysis of the first N records being complete and the file reader finishing reading the file is significant; at least significant enough that our user has complained about it.

While not ideal, using the Exit snap helped. I need to ensure that the downstream processing of the N records completes before the Exit snap triggers. I can use the threshold limit in the Exit snap to “delay” the exit a bit.

Continuing the discussion from Reading first N records from a file?:

Ok, thanks for the explanation. That makes sense. Your issue isn’t really with the fact that snaps upstream (the File Reader + CSV Parser, or whatever) keep running. It’s with the fact that the snaps downstream (a Formatter + File Writer, perhaps) do – they don’t complete (write the file) as soon as the Head snap has written the only document it will write.

So, yes, there’s actually a simple fix we can make to the Head snap to do just that: close the output view as soon as the desired number of documents are written. This will cause the downstream snaps to finish writing their output. I just tried it and it works as expected. I think we should be able to get this fix into our forthcoming release planned for Nov 14.

1 Like

@ptaylor - Once this change is implemented, is the Exit snap the best way to programmatically stop the pipeline once the Head’s output view is closed?

Why do you need the pipeline to stop if the output file containing the analysis results has been written and closed?

But, yes, I suppose you could add an Exit snap after the File Writer if you do want the pipeline to exit. But even if you don’t, the File Writer will have completed and written the file that your user needs, even while the Head continues to consume its input and the pipeline continues to run.

If we no longer need the pipeline running, I don’t want to use Snaplex node resources unnecessarily. Also, users question why is the pipeline still running once it produced the desired file.

I see. Then an Exit snap at the end of the pipeline, plus my fix to the Head snap, will solve this. The Head fix will be in the November release (4.23).

1 Like