cancel
Showing results for 
Search instead for 
Did you mean: 

Efficient pipeline design for reading a large amount of data from an on-premises RDBMS (JDBC) and write it as multiple files in S3

PSAmmirata
Employee
Employee

We have a use case where we need to read a large amount of data from an on-premises RDBMS (JDBC) and write it as multiple files in S3. No matter what, the data has to travel across the network from our on-premises data center to AWS. I’m looking for suggestions on the most efficient pipeline design.

Options we’ve considered:

  1. A single pipeline that runs on on-premises snaplex, reads the data from the RDBMS and writes it to S3.
  2. A single pipeline that runs on AWS snaplex, reads the data from the RDBMS and writes it to S3.
  3. A parent/child pipeline where the parent runs on on-premises snaplex and reads the data from the RDBMS, then uses a Pipeline Execute snap to execute a pipeline on the AWS snaplex to write the data to S3. The data passes over the network to an unconnected input view in the child pipeline.
0 REPLIES 0