Forum Discussion

oggday's avatar
oggday
New Contributor
7 years ago

Getting errors move large data to Snowflake

Hello,
We are trying to move large amount of data from MS SQL to Snowflake using external staging, and getting following error.

Too many S3 upload parts, maximum number of parts is 10000
Resolution:
Increase the buffer size so that the file size divided by the buffer size is smaller than 10000.

Reason:
The file is too large for the buffer size 10000000

The source table contains 1 billion records. We are using external AWS S3 as staging area.

I checked both Snowflake account and Snowflake bulk snap, but cannot find configuration related with buffer size.
It will be appreciated if anyone can share the solution for this.

Thanks!

1 Reply

  • The way most binary snaps function is to write the binary stream with a metadata document about the contents – kind of like the header in an HTTP request. For example here is some preview data of a MultiFileReader snap I’ve hooked up:

    The content-location field will give the path to the file – the full path. This can be used in a FileWriter snap to write out each file with the same file name in a new location. Here’s an example of an expression that would do that:

    Here’s the expression if you want to copy/paste and fill in your server specific information.
    "sftp://transmissions-uat.testsite.com/Outbound/Encrypted/" + $['content-location'].split('/').pop()

    I put together an example pipeline that hopefully will be useful for demonstrating how this all fits together.

    multi-file-read-write_2019_01_23.slp (3.7 KB)

    • vcocco's avatar
      vcocco
      New Contributor II

      thank you very much for this example!
      I will try it out on my org and let you know if I have any questions.

      Thanks again,

      Vin

  • vcocco's avatar
    vcocco
    New Contributor II

    Hello Again,
    The ‘content-location’ code in File Write worked for me. Using the wild-card character in File Read I was able to narrow down to specific files I wanted and was able to do an ‘smb’ write to our network folder.

    Earlier, I tried to import your example Pipeline but got this error:

    I originally downloaded the .slb file but SnapLogic Import didn’t see it on my Mac. I compressed it (turned it into a zip file) and the Importer was able to see it but gave me the error message.

    Also: I’ve been testing out the Email Sender snap to send out an email notifying certain users that a file or files exists on the network folder. I just want to send out an email if and only if the File Reader finds files on the sftp site. Could you point me to some examples?
    Thanks again for your assistance,
    Vin

    • tlikarish's avatar
      tlikarish
      Employee

      To upload the example, go to Designer and click on this icon. That will enable you to import a single pipeline.

      For the email sender, if you were going to do it in the same pipeline you could use the binary router to duplicate the stream, combine the results into a single document, and then pass that to the Email Sender, like this:

      Might make it simpler to implement this in a separate pipeline though. Then you’d just have the Reader → Binary To Doc → Aggregate → Email Sender.

      Hopefully that helps you get started with the next step.

  • vcocco's avatar
    vcocco
    New Contributor II

    Hello again,
    My integration is working: it’s going out to an sftp server and pulling in all unarchived files that start with this string:
    WORKFILE.PSOURCE.HRCNFRM.*

    This results in pulling in the same set of files multiple times.

    The complete file string looks something like this:

    BERKWORK.PSOURCE.HRCNFRM.2983249.20190108110621668

    The file string has the date: 20190108

    I’d like to be able to pull in a file whose date string, above, matches the current date in the form YYYYMMDD. That way I’d only be pulling in files whose ‘lookup string’ matches the current date.
    Could you provide a coding example?
    Thanks again for all you help.

    • Vin
      • vcocco's avatar
        vcocco
        New Contributor II

        Thank you very much for that expression. I’ve made a lot of progress with the pipeline and the information you’ve given me.
        I would like to put that entire file name 'BERKWORK.PSOURCE.HRCNFRM.*." + Date.new().toLocalDateString({“format”:“yyyMMdd”}" into a variable or some other temporary holding area. In the pipeline, I would like to (probably?) use a Boolean expression to check for the existence of the file and send out an email (via the Send Email Snap) based on the condition being true (‘1’).
        Thanks again for all your help.