Efficient pipeline design for reading a large amount of data from an on-premises RDBMS (JDBC) and write it as multiple files in S3

We have a use case where we need to read a large amount of data from an on-premises RDBMS (JDBC) and write it as multiple files in S3. No matter what, the data has to travel across the network from our on-premises data center to AWS. I’m looking for suggestions on the most efficient pipeline design.

Options we’ve considered:

A single pipeline that runs on on-premises snaplex, reads the data from the RDBMS and writes it to S3.
A single pipeline that runs on AWS snaplex, reads the data from the RDBMS and writes it to S3.
A parent/child pipeline where the parent runs on on-premises snaplex and reads the data from the RDBMS, then uses a Pipeline Execute snap to execute a pipeline on the AWS snaplex to write the data to S3. The data passes over the network to an unconnected input view in the child pipeline.

0 REPLIES 0

SnapLogic - Integration Nation

Efficient pipeline design for reading a large amount of data from an on-premises RDBMS (JDBC) and write it as multiple files in S3