cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Writing Zip Files to S3

jskrable
New Contributor III

I am encountering some severe sluggishness in writing zip files to S3. When writing a 76 MB file, it takes 12 minutes to complete, versus 16 second when writing to a local destination.

I think the problem is in transferring from ground to cloud. This process is part of a generic file transport solution, so the read file snap is being executed on a groundplex, and the write file snap is being executed on the cloudplex. This switch is done by a pipeline execute snap specifying execution on the cloudplex. Iโ€™m thinking it is possible the issues are cause by the conversion from binary to document and then back to binary once the document stream is passed into the child pipeline.

Has anyone else run into similar issues? I am happy to provide an outline of the pipeline if that helps.

Thanks.

5 REPLIES 5

aleung
Contributor III

Your groundplex normally should also have access to S3. Have you try configure your child pipeline to also use groundplex? Also, it is alway more performant when you can have your process in a single pipeline instead of parent child.

jskrable
New Contributor III

Hi,

Yes, we can configure the groundplex to access S3. However, this process is designed to generic enough to move files from locations accessible only from the groundplex to locations accessible from the cloudplex, and vice versa. The S3 scenario is just an example.

For anyone interested, I have received a reply from SnapLogic support stating that going through the control plane for large binary streams like this will result in a serious performance hit.

Hi Jack, This has been a while, but if you can recallโ€ฆ How did you end up solving this problem. Did you have to run everything on Groundplex??

For scenarios where large data volumes need to be transferred between a pipeline running on a Groundplex to another pipeline running on a Cloudplex (or another Groundplex), the recommended approach is to have the Groundplex write the data to a file location like S3 which is accessible from the second Snaplex. The temporary fileโ€™s name can be passed to the child pipeline through the pipeline execute input view or as a child pipeline parameter. The S3 File Writer/Reader snaps have multiple options to tune the transfer performance.

When transferring data through a PipeExec snap to another Snaplex, the data transfer is a streaming operation. There are some performance optimizations which are not available for the streaming transfer which are possible when using the batched transfer.