โ12-29-2021 09:07 AM
Hi, I am trying to parse a CSV that is auto-generated by a legacy system. The CSV has no headers.
The problem is, the system conditionally adds columns to individual records. So, the CSV sometimes has records with more columns than others. This causes Snaplogicโs CSV Parser to fail (it doesnโt parse the whole file). The parser will only parse up to the row number where the record with the additional columns is. Below is the error message:
This sounds far-fetched but is there a way to automate a solution to this problem? Perhaps by being able to conditionally control the columns before the document hits the CSV Parser?
โ12-29-2021 10:20 AM
Hey @JPaps,
Parsing unconventional CSV-s can be tricky. Check if this sample pipeline works for you:
Parse Unconventional CSV_2021_12_29.slp (5.3 KB)
Regards,
Bojan
โ12-29-2021 11:44 AM
Thanks for your response @bojanvelevski.
I am trying to test and understand this sample pipeline. Can you kindly give me a high-level explanation of it? I donโt recognize the code in the mapper snap and its doing a lot as seen in the below pipeline statistic (CPU% and memory).
The [Binary to Document] and [Mapper] snaps is a manual replacement for [CSV Parser] snap? Where can I find documentation for the syntax used?
โ12-30-2021 01:17 AM
Hey @JPaps,
Iโm using the Binary to Document snap to get a stringified version of the CSV where rows (records) are separated by /r/n. Thatโs why in the expression, Iโm first splitting the string on โ/r/nโ, that way I can get the headers + the records separately. The rest is a logic to fill the missing headers with โfield_โ + random number, which in your case might be a problem due to high number of records. Anyway, the expression is creating objects out of the Headers and the Records. But because sl.zipObject is automatically adding nulls if thereโs no value (but in our case we need replacement for the missing header, not value) i created the object with key/value reversed, meaning:
{"value":"key"}
If thereโs a missing key, than the expression will automatically add null, which is later replaced with โfield_โ+random 2 digit number. And after filling the missing value (soon to be key), Iโm reversing the object :
{"key":"value"}
Itโs rough, and for smaller number of records is working, but it needs to be tested out with large number of records like in your case.
Regards,
Bojan