02-14-2019 03:00 PM
Contributed by @SriramGopal from Agilisium Consulting
The pipeline is designed to fetch records on an incremental basis from document-oriented NoSQL database system (Mongo in this case) and load to cloud storage (Amazon S3) with partitioning logic. This use case is applicable to Cloud Data Lake initiatives.
This pipeline also includes, the Date based Data Partitioning at the Storage layer and Data Validation trail between source and target.
The Control table is designed in such a way that it holds the source load type (RDBMS, FTP, API etc.) and the corresponding object name. Each object load will have the load start/end times and the records/ documents processed for every load. The source record fetch count and target table load count is calculated for every run. Based on the status (S-success or F-failure) of the load, automated notifications can be triggered to the technical team.
For every load, the data gets partitioned automatically based on the transaction timestamp in the storage layer (S3)
Sources : NoSQL Database, MongoDB Table
Targets : AWS Storage
Snaps used :
IM_NoSQL_S3_Inc_load.slp (29.9 KB)
IM_Nosql_S3_Inc_load_S3writer.slp (4.8 KB)
IM_Nosql_S3_Inc_load_Audit_update.slp (12.0 KB)
02-18-2019 03:49 AM
For any clarifications related to this pattern please reach out to, snaplogic@agilisium.com