Looking to solve: Validation of a file as a whole. Should one record be invalid within a file that is read, then the entire file would be marked as invalid. No matter where the error occurs within the file. Additional info follows.
I am working on a pipeline that is processing a batch of records contained within CSV formatted file. I perform the following steps:
- read the file (File Reader)
- CSV parser to break the data up
- Mapper Call
- Filter operation on the columns: such as checking the value in the first column of the first field and numerous others.
At the end I have the set of records that contain errors within the Filter for errors and also the set of records that passed the filter/validation.
I can write these out to an error and a non error file. But I want to have only one file either an error file or a success file at the end of the pipeline. I am open to other ideas.
Thanks so much for your help and ideas.