Ingest data from NoSQL Database (MongoDB) into AWS Cloud Storage (S3)

Question

Contributed by @SriramGopal from Agilisium Consulting
The pipeline is designed to fetch records on an incremental basis from document-oriented NoSQL database system (Mongo in this case) and load to cloud storage (Amazon S3) with partitioning logic. This use case is applicable to Cloud Data Lake initiatives.
This pipeline also includes, the Date based Data Partitioning at the Storage layer and Data Validation trail between source and target.

Parent Pipeline

S3 Writer Child Pipeline

Audit Update Child Pipeline

Control Table - Tracking
The Control table is designed in such a way that it holds the source load type (RDBMS, FTP, API etc.) and the corresponding object name. Each object load will have the load start/end times and the records/ documents processed for every load. The source record fetch count and target table load count is calculated for every run. Based on the status (S-success or F-failure) of the load, automated notifications can be triggered to the technical team.

Control Table Attributes:

UID – Primary key
SOURCE_TYPE – Type of Source RDBMS, API, Social Media, FTP etc
TABLE_NAME – Table name or object name.
START_DATE – Load start time
ENDDATE – Load end time
SRC_REC_COUNT – Source record count
RGT_REC_COUNT – Target record count
STATUS – ‘S’ Success and ‘F’ Failed based on the source/ target load

Partitioned Load
For every load, the data gets partitioned automatically based on the transaction timestamp in the storage layer (S3)

Configuration
Sources : NoSQL Database, MongoDB Table
Targets : AWS Storage
Snaps used :

Parent Pipeline: MongoDB - Find, Sort, File Writer, Mapper, Router, Copy, JSON Formatter, Redshift Insert, Redshift Select, Redshift - Multi Execute, S3 File Writer, S3 File Reader, Aggregate, Pipeline Execute

S3 Writer Child Pipeline: Mapper, JSON Formatter, S3 File Writer

Audit Update Child Pipeline: File Reader, JSON Parser, Mapper, Router, Aggregate, Redshift - Multi Execute

Downloads
IM_NoSQL_S3_Inc_load.slp (29.9 KB)
IM_Nosql_S3_Inc_load_S3writer.slp (4.8 KB)
IM_Nosql_S3_Inc_load_Audit_update.slp (12.0 KB)

sriramgopal · Answer

For any clarifications related to this pattern please reach out to, snaplogic@agilisium.com

Forum Discussion

Ingest data from NoSQL Database (MongoDB) into AWS Cloud Storage (S3)

Parent Pipeline

S3 Writer Child Pipeline

Audit Update Child Pipeline

Control Table - Tracking

Control Table Attributes:

Partitioned Load

Configuration

Downloads

1 Reply

Recent Discussions

Secret Management vs Enhanced Account Encryption

sequence month of dates

Flatten JSON files into CSV files

Filter out the specific column

SnapLogic Patterns Catalog Terms of Use