cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

CSV parser not parsing when CR is found in column content

bsmithab
New Contributor

Hi
we have a csv file which has around 40 columns and one of the column is notes column which has Carriage return (CR) in the content, due to which the CSV parser is not able to break the line properly and its making it fail in loading actual content of the record instead the record gets split into 2 and fails

We have tried in using binary document and tried replacing the value and then document to binary and then CSV parser, No luck

Any other alternatives needs to be done to handle these records

Thanks
Regards
smitha

6 REPLIES 6

tlikarish
Employee
Employee

Hi Smitha,

This should be possible. Try and look at this example and see if you can adapt it to for your needs.

The mapper can be used to conveniently alter binary data before parsing. See the โ€œviewsโ€ section in the Mapper documentation.

In the attached pipeline, the mapper is using binary views, converting the input data to a string, then replacing the \r\n to a space character .

cr-example.zip (3.1 KB)

bsmithab
New Contributor

image
Hi tlikarish,

I have tried the approach you have suggested, since as you see in the attached snapshot the record spans over multiple line and there is only CR in one of the columns in between, I have changed the expression to replace -$content.toString().replace(โ€˜\rโ€™, โ€™ '), by this the records did not get loaded properly, the records was still not considered as single record.
Can you let us know if we need to do something else

Thanks

Regards
smitha csvparser.zip (25.5 KB)

Are you seeing an error message? Since youโ€™re using quotes and the column is quoted, the CSV Parser should treat the carriage return as part of the columnโ€™s value and not as a row delimiter. Is it possible the quoting is off?

Doh โ€“ also messed up with the expression I gave you.

$content.toString().replace('\r', ' ')

Only replaces the first match. You should probably use replaceAll or change the regular expression to /\r/g. This will replace all carriage returns, so if the lines are delimited with \r\n, then youโ€™d have to use something like /\r(?!\n)/g, which would remove all carriage returns not followed by a line feed.