Forum Discussion

PSAmmirata's avatar
PSAmmirata
Employee
5 years ago

Efficient pipeline design for reading a large amount of data from an on-premises RDBMS (JDBC) and write it as multiple files in S3

We have a use case where we need to read a large amount of data from an on-premises RDBMS (JDBC) and write it as multiple files in S3. No matter what, the data has to travel across the network from our on-premises data center to AWS. I’m looking for suggestions on the most efficient pipeline design.

Options we’ve considered:

  1. A single pipeline that runs on on-premises snaplex, reads the data from the RDBMS and writes it to S3.
  2. A single pipeline that runs on AWS snaplex, reads the data from the RDBMS and writes it to S3.
  3. A parent/child pipeline where the parent runs on on-premises snaplex and reads the data from the RDBMS, then uses a Pipeline Execute snap to execute a pipeline on the AWS snaplex to write the data to S3. The data passes over the network to an unconnected input view in the child pipeline.
No RepliesBe the first to reply