06-28-2021 09:20 PM
Hi team,
Looking for a suggestion/enhancement to achieve the below senario. when i read a csv file (it might have \n\r (windows) & \n(unix) ). Who to fix this …
Today I read the file and when I do the bulk load using redshift bulk load snap and the data gets loaded with the \n\r . How can I escape these characters?
When i look at the properties of the snap , i could see the below. but it’s not working.
Solved! Go to Solution.
07-01-2021 02:38 AM
so … this gave me the solution
$content.contains(‘\r\n’)?
($content.replaceAll(‘\r\n’,‘\n’) && $content.replaceAll(‘\n’,‘’) ) : $content.replaceAll(‘\n’,‘\n’)
06-30-2021 04:39 PM
Here is a community post about stripping line endings; maybe this could be used to normalize embedded EOL characters in your data, such as \r\n -> \n
.
I can research Redshift Bulk Load’s additional options … The COPY command has quite a few options, I see a CSV data format parameter too.
06-30-2021 11:33 PM
Well, thanks for coming back @mbowen … my target is redshift … I will be getting the file from the win / Unix system and blow will be its behaviour
Win
Unix
I am trying to read the csv file irrespective of the system
i am using the blow code for replacing the \r\n – ($content.replaceAll(‘\r\n’,‘\n’) && $content.replaceAll(‘\n’,‘’) )
Problem - when I pass the Unix file it replaces the \n to ‘’ and makes the csv invalid.
how to accommodate them both
07-01-2021 02:38 AM
so … this gave me the solution
$content.contains(‘\r\n’)?
($content.replaceAll(‘\r\n’,‘\n’) && $content.replaceAll(‘\n’,‘’) ) : $content.replaceAll(‘\n’,‘\n’)
07-01-2021 05:57 AM
@makeshwar - be cautious using the Binary to Document snap - it will place the entire input file into memory. This is fine if you are dealing with MB size files, but if you get into multiple GB, it is possible to consume a majority of your JVM memory on the snaplex node.
Is it possible to convert the Windows files to Unix before writing to S3? Or have them dropped to a different location so they can be translated to UNIX carriage control prior to loading into Redshift?