cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Validating a CSV file: either all rows are accepted or no rows are accepted

snapping_turtle
New Contributor

Good Morning:

Looking to solve: Validation of a file as a whole. Should one record be invalid within a file that is read, then the entire file would be marked as invalid. No matter where the error occurs within the file. Additional info follows.

I am working on a pipeline that is processing a batch of records contained within CSV formatted file. I perform the following steps:

  1. read the file (File Reader)
  2. CSV parser to break the data up
  3. Mapper Call
  4. Copy
  5. Filter operation on the columns: such as checking the value in the first column of the first field and numerous others.

At the end I have the set of records that contain errors within the Filter for errors and also the set of records that passed the filter/validation.

I can write these out to an error and a non error file. But I want to have only one file either an error file or a success file at the end of the pipeline. I am open to other ideas.

Thanks so much for your help and ideas.

J

3 REPLIES 3

anubhav_nautiya
Contributor

Cant you use a union for you error and regular flow? If the file structure is same

Morning:

I can try that. The union perhaps with a move at the end. I would have the error output stream and the success output stream. I will give that a try.

Thanks.

snapping_turtle
New Contributor

So close, ok so I have the error and the success streams. I am rejoining them. When I have any error, I want the filename to be have a .err extension. When I have no errors I want the filename to have a .csv extension.

If only there was an if error file exists then merge files and set file extension to err else no errors so set file extension to csv that would allow this to work.

I thought about running one pass creating a status file that contains either 0 or 1 and then using that status to define the output file extension.