CSV parser not parsing when CR is found in column content

Question

Hi
we have a csv file which has around 40 columns and one of the column is notes column which has Carriage return (CR) in the content, due to which the CSV parser is not able to break the line properly and its making it fail in loading actual content of the record instead the record gets split into 2 and fails
We have tried in using binary document and tried replacing the value and then document to binary and then CSV parser, No luck
Any other alternatives needs to be done to handle these records
Thanks
Regards
smitha

tlikarish · Answer

Hi Smitha,
This should be possible. Try and look at this example and see if you can adapt it to for your needs.
The mapper can be used to conveniently alter binary data before parsing. See the “views” section in the Mapper documentation.
In the attached pipeline, the mapper is using binary views, converting the input data to a string, then replacing the 
 to a space character  .
cr-example.zip (3.1 KB)

bsmithab · Answer

Hi tlikarish,
I have tried the approach you have suggested, since as you see in the attached snapshot the record spans over multiple line and there is only CR in one of the columns in between, I have changed the expression to replace -$content.toString().replace(‘\r’, ’ '), by this the records did not get loaded properly, the records was still not considered as single record.
Can you let us know if we need to do something else
Thanks
Regards
smitha csvparser.zip (25.5 KB)

tlikarish · Answer

Are you seeing an error message? Since you’re using quotes and the column is quoted, the CSV Parser should treat the carriage return as part of the column’s value and not as a row delimiter. Is it possible the quoting is off?

tlikarish · Answer

Doh – also messed up with the expression I gave you.
$content.toString().replace('', ' ')

Only replaces the first match. You should probably use replaceAll or change the regular expression to //g. This will replace all carriage returns, so if the lines are delimited with 
, then you’d have to use something like /(?!
)/g, which would remove all carriage returns not followed by a line feed.

matt · Answer

What if we aren’t using quotes?

Forum Discussion

CSV parser not parsing when CR is found in column content

6 Replies

Recent Discussions

Javascript to promote top level lists

Google Sheets Subscribe questions

Basic string transformations not working

Can we generate XML file in pretty print format using native snapLogic snaps?

Multipart Reader failure - 'content-type' was not found