Ingest data from File Server (FTP/sFTP) into AWS Cloud Storage (S3)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-14-2019 02:02 PM
Contributed by @SriramGopal from Agilisium Consulting
The pipeline is designed to transfer files from a FTP/SFTP server and load them to cloud storage (Amazon S3 in this case). This use case is applicable to Cloud Data Lake initiatives.
This pipeline also includes, the Date based Data Partitioning at the Storage layer and Data Validation trail between source and target.
Control Table - Tracking
The Control table is designed in such a way that it holds the source load type (RDBMS, FTP, API etc.) and the corresponding object name. Each object load will have the load start/end times and the records/ documents processed for every load. The source record fetch count and target table load count is calculated for every run. Based on the status (S-success or F-failure) of the load, automated notifications can be triggered to the technical team.
Control Table Attributes:
- UID – Primary key
- SOURCE_TYPE – Type of Source RDBMS, API, Social Media, FTP etc
- TABLE_NAME – Table name or object name.
- START_DATE – Load start time
- ENDDATE – Load end time
- SRC_REC_COUNT – Source record count
- RGT_REC_COUNT – Target record count
- STATUS – ‘S’ Success and ‘F’ Failed based on the source/ target load
Partitioned Load
For every load, the data gets partitioned automatically based on the transaction timestamp in the storage layer (S3)
Configuration
Sources: FTP/sFTP File Extracts
Targets: AWS Storage
Snaps used: File Reader, File Writer, Mapper, Router, JSON Formatter, Redshift Insert, Redshift Select, Redshift - Multi Execute, S3 File Writer
Downloads
IM_FTP_to_S3_load.slp (15.3 KB)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-18-2019 03:48 AM
For any clarifications regarding this pattern please contact, snaplogic@agilisium.com
