09-04-2019 11:47 PM
Hi
we have a csv file which has around 40 columns and one of the column is notes column which has Carriage return (CR) in the content, due to which the CSV parser is not able to break the line properly and its making it fail in loading actual content of the record instead the record gets split into 2 and fails
We have tried in using binary document and tried replacing the value and then document to binary and then CSV parser, No luck
Any other alternatives needs to be done to handle these records
Thanks
Regards
smitha
09-05-2019 10:55 AM
Hi Smitha,
This should be possible. Try and look at this example and see if you can adapt it to for your needs.
The mapper can be used to conveniently alter binary data before parsing. See the “views” section in the Mapper documentation.
In the attached pipeline, the mapper is using binary views, converting the input data to a string, then replacing the \r\n
to a space character
.
cr-example.zip (3.1 KB)
09-05-2019 11:20 PM
Hi tlikarish,
I have tried the approach you have suggested, since as you see in the attached snapshot the record spans over multiple line and there is only CR in one of the columns in between, I have changed the expression to replace -$content.toString().replace(‘\r’, ’ '), by this the records did not get loaded properly, the records was still not considered as single record.
Can you let us know if we need to do something else
Thanks
Regards
smitha csvparser.zip (25.5 KB)
09-06-2019 07:58 AM
Are you seeing an error message? Since you’re using quotes and the column is quoted, the CSV Parser should treat the carriage return as part of the column’s value and not as a row delimiter. Is it possible the quoting is off?
09-06-2019 08:36 AM
Doh – also messed up with the expression I gave you.
$content.toString().replace('\r', ' ')
Only replaces the first match. You should probably use replaceAll
or change the regular expression to /\r/g
. This will replace all carriage returns, so if the lines are delimited with \r\n
, then you’d have to use something like /\r(?!\n)/g
, which would remove all carriage returns not followed by a line feed.