Ingest data from SQL Server (RDBMS) to AWS Cloud Storage (S3)
Contributed by @SriramGopal from Agilisium Consulting The pipeline is designed to fetch records on an incremental basis from any RDBMS system and load to cloud storage (Amazon S3 in this case) with partitioning logic. This use case is applicable to Cloud Data Lake initiatives. This pipeline also includes, the Date based Data Partitioning at the Storage layer and Data Validation trail between source and target. Parent Pipeline Control Table check : Gets the last run details from Control table. ETL Process : Fetches the incremental source data based on Control table and loads the data to S3 Control Table write : Updates the latest run data to Control table for tracking S3 Writer Child Pipeline Audit Update Child Pipeline Control Table - Tracking The Control table is designed in such a way that it holds the source load type (RDBMS, FTP, API etc.) and the corresponding object name. Each object load will have the load start/end times and the records/ documents processed for every load. The source record fetch count and target table load count is calculated for every run. Based on the status (S-success or F-failure) of the load, automated notifications can be triggered to the technical team. Control Table Attributes: UID – Primary key SOURCE_TYPE – Type of Source RDBMS, API, Social Media, FTP etc TABLE_NAME – Table name or object name. START_DATE – Load start time ENDDATE – Load end time SRC_REC_COUNT – Source record count RGT_REC_COUNT – Target record count STATUS – ‘S’ Success and ‘F’ Failed based on the source/ target load Partitioned Load For every load, the data gets partitioned automatically based on the transaction timestamp in the storage layer (S3) Configuration Sources : RDBMS Database, SQL Server Table Targets : AWS Storage Snaps used : Parent Pipeline : Sort, File Writer, Mapper, Router, Copy, JSON Formatter, Redshift Insert, Redshift Select, Redshift - Multi Execute, S3 File Writer, S3 File Reader, Aggregate, Pipeline Execute S3 Writer Child Pipeline : Mapper, JSON Formatter, S3 File Writer Audit Update Child Pipeline : File Reader, JSON Parser, Mapper, Router, Aggregate, Redshift - Multi Execute Downloads IM_RDBMS_S3_Inc_load.slp (43.6 KB) IM_RDBMS_S3_Inc_load_S3writer.slp (12.2 KB) IM_RDBMS_S3_Inc_load_Audit_update.slp (18.3 KB)7.2KViews1like3CommentsS3 File as Email Attachment
Hello, As part of an error handling pipeline, I am attempting to write a detailed error file to an S3 bucket, and then attach that file to the email notification at the end of the error pipeline to alert users of the issue. I was able to do this successfully when writing the file to snaplogic’s native file system, but the email sender snap is failing consistently when attempting to grab the S3 file. The error message is: Cannot find AWS access key ID in URL Resolution: Please enter valid URI path. Reason: URI syntax error. For the attachments field, I have something like this: s3:///buistintegration/test_errors_12082017.json I tried adding the AWS Access Key and Secret in a few different combinations before the bucket name, with no luck. Has anyone attempted this before? Is it possible? Any input is appreciated. Thanks, Jack Skrable7.1KViews0likes5CommentsWriting Zip Files to S3
I am encountering some severe sluggishness in writing zip files to S3. When writing a 76 MB file, it takes 12 minutes to complete, versus 16 second when writing to a local destination. I think the problem is in transferring from ground to cloud. This process is part of a generic file transport solution, so the read file snap is being executed on a groundplex, and the write file snap is being executed on the cloudplex. This switch is done by a pipeline execute snap specifying execution on the cloudplex. I’m thinking it is possible the issues are cause by the conversion from binary to document and then back to binary once the document stream is passed into the child pipeline. Has anyone else run into similar issues? I am happy to provide an outline of the pipeline if that helps. Thanks.6.3KViews1like5CommentsIngest data from NoSQL Database (MongoDB) into AWS Cloud Storage (S3)
Contributed by @SriramGopal from Agilisium Consulting The pipeline is designed to fetch records on an incremental basis from document-oriented NoSQL database system (Mongo in this case) and load to cloud storage (Amazon S3) with partitioning logic. This use case is applicable to Cloud Data Lake initiatives. This pipeline also includes, the Date based Data Partitioning at the Storage layer and Data Validation trail between source and target. Parent Pipeline S3 Writer Child Pipeline Audit Update Child Pipeline Control Table - Tracking The Control table is designed in such a way that it holds the source load type (RDBMS, FTP, API etc.) and the corresponding object name. Each object load will have the load start/end times and the records/ documents processed for every load. The source record fetch count and target table load count is calculated for every run. Based on the status (S-success or F-failure) of the load, automated notifications can be triggered to the technical team. Control Table Attributes: UID – Primary key SOURCE_TYPE – Type of Source RDBMS, API, Social Media, FTP etc TABLE_NAME – Table name or object name. START_DATE – Load start time ENDDATE – Load end time SRC_REC_COUNT – Source record count RGT_REC_COUNT – Target record count STATUS – ‘S’ Success and ‘F’ Failed based on the source/ target load Partitioned Load For every load, the data gets partitioned automatically based on the transaction timestamp in the storage layer (S3) Configuration Sources : NoSQL Database, MongoDB Table Targets : AWS Storage Snaps used : Parent Pipeline: MongoDB - Find, Sort, File Writer, Mapper, Router, Copy, JSON Formatter, Redshift Insert, Redshift Select, Redshift - Multi Execute, S3 File Writer, S3 File Reader, Aggregate, Pipeline Execute S3 Writer Child Pipeline: Mapper, JSON Formatter, S3 File Writer Audit Update Child Pipeline: File Reader, JSON Parser, Mapper, Router, Aggregate, Redshift - Multi Execute Downloads IM_NoSQL_S3_Inc_load.slp (29.9 KB) IM_Nosql_S3_Inc_load_S3writer.slp (4.8 KB) IM_Nosql_S3_Inc_load_Audit_update.slp (12.0 KB)5.2KViews0likes1CommentIngest data from File Server (FTP/sFTP) into AWS Cloud Storage (S3)
Contributed by @SriramGopal from Agilisium Consulting The pipeline is designed to transfer files from a FTP/SFTP server and load them to cloud storage (Amazon S3 in this case). This use case is applicable to Cloud Data Lake initiatives. This pipeline also includes, the Date based Data Partitioning at the Storage layer and Data Validation trail between source and target. Control Table - Tracking The Control table is designed in such a way that it holds the source load type (RDBMS, FTP, API etc.) and the corresponding object name. Each object load will have the load start/end times and the records/ documents processed for every load. The source record fetch count and target table load count is calculated for every run. Based on the status (S-success or F-failure) of the load, automated notifications can be triggered to the technical team. Control Table Attributes: UID – Primary key SOURCE_TYPE – Type of Source RDBMS, API, Social Media, FTP etc TABLE_NAME – Table name or object name. START_DATE – Load start time ENDDATE – Load end time SRC_REC_COUNT – Source record count RGT_REC_COUNT – Target record count STATUS – ‘S’ Success and ‘F’ Failed based on the source/ target load Partitioned Load For every load, the data gets partitioned automatically based on the transaction timestamp in the storage layer (S3) Configuration Sources: FTP/sFTP File Extracts Targets: AWS Storage Snaps used: File Reader, File Writer, Mapper, Router, JSON Formatter, Redshift Insert, Redshift Select, Redshift - Multi Execute, S3 File Writer Downloads IM_FTP_to_S3_load.slp (15.3 KB)5KViews0likes1CommentMove data from AWS S3 to Snowflake Data Warehouse
Created by @pkoppishetty The pipeline pattern moves data from AWS S3 to Snowflake Data Warehouse. The pattern contains a Directory Browser (to retrieve the file list from S3), Snowflake Execute, and Snowflake Bulkload Snaps (to load the corresponding files into Snowflake). Configuration File Reader Preview Snowflake Bulkload Output Table Data from Snowflake Select Snap Sources: AWS S3 Targets: Snowflake Snaps used: Snowflake Execute, Directory Broswer, File Reader, Snowflake Bulk Load, Snowflake Select Downloads Integration pattern for moving data from AWS S3 to Snowflake datawarehouse.slp (14.2 KB)4.6KViews0likes1CommentRedshift bulk load
Hi team, Looking for a suggestion/enhancement to achieve the below senario. when i read a csv file (it might have \n\r (windows) & \n(unix) ). Who to fix this … Today I read the file and when I do the bulk load using redshift bulk load snap and the data gets loaded with the \n\r . How can I escape these characters? When i look at the properties of the snap , i could see the below. but it’s not working.Solved4.1KViews0likes5CommentsIngesting multiple AWS S3 files into a database
I have a Amazon S3 bucket containing multiple files that I’d like to extract and read into a database. The files are all .GZ (gzip) files. The file names will change each day but will all have the same format/contents once unzipped. I was thinking it would be like: S3 browser → mapping → S3 reader → JSON Parser → database target But this fails validation at JSON parser step, with: Failure: Cannot parse JSON data, Reason: Unable to create json parser for the given input stream, Illegal character ((CTRL-CHAR, code 31)): only regular white space (\r, \n, \t) is allowed between tokens After the S3 Reader step, I can preview the data and see the list of files that would be imported, but not the contents of the files themselves. Any suggestions for the right way to read in the contents of several files in a S3 bucket at once? Thank you!Solved4.1KViews0likes4CommentsIngest data from Salesforce application into AWS Cloud Storage (S3)
Contributed by @SriramGopal from Agilisium Consulting The pipeline is designed to fetch records from CRM application (Salesforce in this case) via API integration and load to cloud storage (Amazon S3 in this case) with partitioning logic. This use case is applicable to Cloud Data Lake initiatives. This pipeline also includes, the Date based Data Partitioning at the Storage layer and Data Validation trail between source and target. Control Table - Tracking The Control table is designed in such a way that it holds the source load type (RDBMS, FTP, API etc.) and the corresponding object name. Each object load will have the load start/end times and the records/ documents processed for every load. The source record fetch count and target table load count is calculated for every run. Based on the status (S-success or F-failure) of the load, automated notifications can be triggered to the technical team. Control Table Attributes: UID – Primary key SOURCE_TYPE – Type of Source RDBMS, API, Social Media, FTP etc TABLE_NAME – Table name or object name. START_DATE – Load start time ENDDATE – Load end time SRC_REC_COUNT – Source record count RGT_REC_COUNT – Target record count STATUS – ‘S’ Success and ‘F’ Failed based on the source/ target load Partitioned Load For every load, the data gets partitioned automatically based on the transaction timestamp in the storage layer (S3) Configuration Sources: Salesforce Account Targets: AWS Storage Snaps used: Salesforce Read, File Reader, File Writer, Mapper, Router, Copy, JSON Formatter, Redshift Insert, Redshift Select, Redshift - Multi Execute, S3 File Writer, S3 File Reader, Aggregate, JSON Parser Downloads IM_API_Salesforce_to_S3_load.slp (31.6 KB)3.8KViews0likes1Comment