cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Redshift bulk load

makeshwar
New Contributor II

Hi team,

Looking for a suggestion/enhancement to achieve the below senario. when i read a csv file (it might have \n\r (windows) & \n(unix) ). Who to fix this โ€ฆ

Today I read the file and when I do the bulk load using redshift bulk load snap and the data gets loaded with the \n\r . How can I escape these characters?

When i look at the properties of the snap , i could see the below. but itโ€™s not working.
image

1 ACCEPTED SOLUTION

makeshwar
New Contributor II

so โ€ฆ this gave me the solution

$content.contains(โ€˜\r\nโ€™)?
($content.replaceAll(โ€˜\r\nโ€™,โ€˜\nโ€™) && $content.replaceAll(โ€˜\nโ€™,โ€˜โ€™) ) : $content.replaceAll(โ€˜\nโ€™,โ€˜\nโ€™)

View solution in original post

5 REPLIES 5

mbowen
Employee
Employee

Here is a community post about stripping line endings; maybe this could be used to normalize embedded EOL characters in your data, such as \r\n -> \n.

I can research Redshift Bulk Loadโ€™s additional options โ€ฆ The COPY command has quite a few options, I see a CSV data format parameter too.

makeshwar
New Contributor II

Well, thanks for coming back @mbowen โ€ฆ my target is redshift โ€ฆ I will be getting the file from the win / Unix system and blow will be its behaviour
Win
image
Unix
image

I am trying to read the csv file irrespective of the system
image
i am using the blow code for replacing the \r\n โ€“ ($content.replaceAll(โ€˜\r\nโ€™,โ€˜\nโ€™) && $content.replaceAll(โ€˜\nโ€™,โ€˜โ€™) )

Problem - when I pass the Unix file it replaces the \n to โ€˜โ€™ and makes the csv invalid.

how to accommodate them both

makeshwar
New Contributor II

so โ€ฆ this gave me the solution

$content.contains(โ€˜\r\nโ€™)?
($content.replaceAll(โ€˜\r\nโ€™,โ€˜\nโ€™) && $content.replaceAll(โ€˜\nโ€™,โ€˜โ€™) ) : $content.replaceAll(โ€˜\nโ€™,โ€˜\nโ€™)

koryknick
Employee
Employee

@makeshwar - be cautious using the Binary to Document snap - it will place the entire input file into memory. This is fine if you are dealing with MB size files, but if you get into multiple GB, it is possible to consume a majority of your JVM memory on the snaplex node.

Is it possible to convert the Windows files to Unix before writing to S3? Or have them dropped to a different location so they can be translated to UNIX carriage control prior to loading into Redshift?