cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

CSV Parser Troubling Error (Cannot complete CSV data parsing)

JPaps
New Contributor III

Hi, I am trying to parse a CSV that is auto-generated by a legacy system. The CSV has no headers.

The problem is, the system conditionally adds columns to individual records. So, the CSV sometimes has records with more columns than others. This causes Snaplogicโ€™s CSV Parser to fail (it doesnโ€™t parse the whole file). The parser will only parse up to the row number where the record with the additional columns is. Below is the error message:
image

This sounds far-fetched but is there a way to automate a solution to this problem? Perhaps by being able to conditionally control the columns before the document hits the CSV Parser?

3 REPLIES 3

bojanvelevski
Valued Contributor

Hey @JPaps,

Parsing unconventional CSV-s can be tricky. Check if this sample pipeline works for you:

Parse Unconventional CSV_2021_12_29.slp (5.3 KB)

Regards,
Bojan

JPaps
New Contributor III

Thanks for your response @bojanvelevski.
I am trying to test and understand this sample pipeline. Can you kindly give me a high-level explanation of it? I donโ€™t recognize the code in the mapper snap and its doing a lot as seen in the below pipeline statistic (CPU% and memory).
image
The [Binary to Document] and [Mapper] snaps is a manual replacement for [CSV Parser] snap? Where can I find documentation for the syntax used?

bojanvelevski
Valued Contributor

Hey @JPaps,

Iโ€™m using the Binary to Document snap to get a stringified version of the CSV where rows (records) are separated by /r/n. Thatโ€™s why in the expression, Iโ€™m first splitting the string on โ€˜/r/nโ€™, that way I can get the headers + the records separately. The rest is a logic to fill the missing headers with โ€˜field_โ€™ + random number, which in your case might be a problem due to high number of records. Anyway, the expression is creating objects out of the Headers and the Records. But because sl.zipObject is automatically adding nulls if thereโ€™s no value (but in our case we need replacement for the missing header, not value) i created the object with key/value reversed, meaning:

{"value":"key"}

If thereโ€™s a missing key, than the expression will automatically add null, which is later replaced with โ€˜field_โ€™+random 2 digit number. And after filling the missing value (soon to be key), Iโ€™m reversing the object :

{"key":"value"}

Itโ€™s rough, and for smaller number of records is working, but it needs to be tested out with large number of records like in your case.

Regards,
Bojan