Created by Chris Ward, SnapLogic
This Pipeline replicates table data obtained from an Oracle database and upserts the data into Redshift.
Firstly, the Parent Pipeline reads the Source and Target schema & tables and a where clause used to filter the source data from an expression file “oracle_to_redshift__c.expr” (the customer would need to modify this file to correspond with the table & schemas they are looking to target), then obtains the primary key constraints for each target table in Redshift. The Pipeline then constructs an array of documents for up to 10 key columns that will be used to determine the unicity of the data when being upserted into Redshift. Each document is then passed into the child pipeline that reads the data from the source Oracle table and then upserts the data into Redshift. A router Snap is used between the Oracle read and Redshift upsert to achieve concurrency for large data volumes.
The Pattern used within this Pipeline could potentially be reworked to operate with different source and target databases, eg SQL Server to Google Big Query.
To make this Pipeline reusable, make use of the expression file to store the source and target tables & schemas.
Sources: Oracle Table
Targets: Redshift Table
Snaps used: Mapper, JSON Splitter, Redshift Execute, Group By Fields, Pipeline Execute, Oracle Select, Router, Redshift Upsert