Writing Zip Files to S3

Question

I am encountering some severe sluggishness in writing zip files to S3. When writing a 76 MB file, it takes 12 minutes to complete, versus 16 second when writing to a local destination.
I think the problem is in transferring from ground to cloud. This process is part of a generic file transport solution, so the read file snap is being executed on a groundplex, and the write file snap is being executed on the cloudplex. This switch is done by a pipeline execute snap specifying execution on the cloudplex. I’m thinking it is possible the issues are cause by the conversion from binary to document and then back to binary once the document stream is passed into the child pipeline.
Has anyone else run into similar issues? I am happy to provide an outline of the pipeline if that helps.
Thanks.

aleung · Answer

Your groundplex normally should also have access to S3.  Have you try configure your child pipeline to also use groundplex?  Also, it is alway more performant when you can have your process in a single pipeline instead of parent child.

jskrable · Answer

Hi,
Yes, we can configure the groundplex to access S3. However, this process is designed to generic enough to move files from locations accessible only from the groundplex to locations accessible from the cloudplex, and vice versa. The S3 scenario is just an example.
For anyone interested, I have received a reply from SnapLogic support stating that going through the control plane for large binary streams like this will result in a serious performance hit.

sivaprasadanjam · Answer

Hi Jack, This has been a while, but if you can recall… How did you end up solving this problem. Did you have to run everything on Groundplex??

akidave · Answer

For scenarios where large data volumes need to be transferred between a pipeline running on a Groundplex to another pipeline running on a Cloudplex (or another Groundplex), the recommended approach is to have the Groundplex write the data to a file location like S3 which is accessible from the second Snaplex. The temporary file’s name can be passed to the child pipeline through the pipeline execute input view or as a child pipeline parameter. The S3 File Writer/Reader snaps have multiple options to tune the transfer performance.
When transferring data through a PipeExec snap to another Snaplex, the data transfer is a streaming operation. There are some performance optimizations which are not available for the streaming transfer which are possible when using the batched transfer.

jskrable · Answer

Yes, we did have to run everything from the ground.

Forum Discussion

Writing Zip Files to S3

5 Replies

Recent Discussions

Way to lock down in Prod org to "Monitor" only access?

trace API and proxy calls

Pagination Logic Fails After Migrating from REST GET to HTTP Client Snap

Pipeline Execute Pool size

Concat values of a field based on value of another field