s3

21 Topics

How to load Today Date Files from s3 Bucket to SLDB
Hi, I am getting the files from amazon s3 bucket every day, I want to load today date files from s3 bucket to SLDB.Any one can help me on that. Thanking you in advance. Thanks M SATHISH
sathishmadavali
3 years ago Place Designing and Running Pipelines
8.5KViews
0likes
17Comments
Ingest data from SQL Server (RDBMS) to AWS Cloud Storage (S3)
Contributed by @SriramGopal from Agilisium Consulting The pipeline is designed to fetch records on an incremental basis from any RDBMS system and load to cloud storage (Amazon S3 in this case) with partitioning logic. This use case is applicable to Cloud Data Lake initiatives. This pipeline also includes, the Date based Data Partitioning at the Storage layer and Data Validation trail between source and target. Parent Pipeline Control Table check : Gets the last run details from Control table. ETL Process : Fetches the incremental source data based on Control table and loads the data to S3 Control Table write : Updates the latest run data to Control table for tracking S3 Writer Child Pipeline Audit Update Child Pipeline Control Table - Tracking The Control table is designed in such a way that it holds the source load type (RDBMS, FTP, API etc.) and the corresponding object name. Each object load will have the load start/end times and the records/ documents processed for every load. The source record fetch count and target table load count is calculated for every run. Based on the status (S-success or F-failure) of the load, automated notifications can be triggered to the technical team. Control Table Attributes: UID – Primary key SOURCE_TYPE – Type of Source RDBMS, API, Social Media, FTP etc TABLE_NAME – Table name or object name. START_DATE – Load start time ENDDATE – Load end time SRC_REC_COUNT – Source record count RGT_REC_COUNT – Target record count STATUS – ‘S’ Success and ‘F’ Failed based on the source/ target load Partitioned Load For every load, the data gets partitioned automatically based on the transaction timestamp in the storage layer (S3) Configuration Sources : RDBMS Database, SQL Server Table Targets : AWS Storage Snaps used : Parent Pipeline : Sort, File Writer, Mapper, Router, Copy, JSON Formatter, Redshift Insert, Redshift Select, Redshift - Multi Execute, S3 File Writer, S3 File Reader, Aggregate, Pipeline Execute S3 Writer Child Pipeline : Mapper, JSON Formatter, S3 File Writer Audit Update Child Pipeline : File Reader, JSON Parser, Mapper, Router, Aggregate, Redshift - Multi Execute Downloads IM_RDBMS_S3_Inc_load.slp (43.6 KB) IM_RDBMS_S3_Inc_load_S3writer.slp (12.2 KB) IM_RDBMS_S3_Inc_load_Audit_update.slp (18.3 KB)
SriramGopal
7 years ago Place Patterns
7.2KViews
1like
3Comments
S3 File as Email Attachment
Hello, As part of an error handling pipeline, I am attempting to write a detailed error file to an S3 bucket, and then attach that file to the email notification at the end of the error pipeline to alert users of the issue. I was able to do this successfully when writing the file to snaplogic’s native file system, but the email sender snap is failing consistently when attempting to grab the S3 file. The error message is: Cannot find AWS access key ID in URL Resolution: Please enter valid URI path. Reason: URI syntax error. For the attachments field, I have something like this: s3:///buistintegration/test_errors_12082017.json I tried adding the AWS Access Key and Secret in a few different combinations before the bucket name, with no luck. Has anyone attempted this before? Is it possible? Any input is appreciated. Thanks, Jack Skrable
jskrable
8 years ago Place Designing and Running Pipelines
7.1KViews
0likes
5Comments
Writing Zip Files to S3
I am encountering some severe sluggishness in writing zip files to S3. When writing a 76 MB file, it takes 12 minutes to complete, versus 16 second when writing to a local destination. I think the problem is in transferring from ground to cloud. This process is part of a generic file transport solution, so the read file snap is being executed on a groundplex, and the write file snap is being executed on the cloudplex. This switch is done by a pipeline execute snap specifying execution on the cloudplex. I’m thinking it is possible the issues are cause by the conversion from binary to document and then back to binary once the document stream is passed into the child pipeline. Has anyone else run into similar issues? I am happy to provide an outline of the pipeline if that helps. Thanks.
jskrable
8 years ago Place Designing and Running Pipelines
6.4KViews
1like
5Comments
Ingest data from NoSQL Database (MongoDB) into AWS Cloud Storage (S3)
Contributed by @SriramGopal from Agilisium Consulting The pipeline is designed to fetch records on an incremental basis from document-oriented NoSQL database system (Mongo in this case) and load to cloud storage (Amazon S3) with partitioning logic. This use case is applicable to Cloud Data Lake initiatives. This pipeline also includes, the Date based Data Partitioning at the Storage layer and Data Validation trail between source and target. Parent Pipeline S3 Writer Child Pipeline Audit Update Child Pipeline Control Table - Tracking The Control table is designed in such a way that it holds the source load type (RDBMS, FTP, API etc.) and the corresponding object name. Each object load will have the load start/end times and the records/ documents processed for every load. The source record fetch count and target table load count is calculated for every run. Based on the status (S-success or F-failure) of the load, automated notifications can be triggered to the technical team. Control Table Attributes: UID – Primary key SOURCE_TYPE – Type of Source RDBMS, API, Social Media, FTP etc TABLE_NAME – Table name or object name. START_DATE – Load start time ENDDATE – Load end time SRC_REC_COUNT – Source record count RGT_REC_COUNT – Target record count STATUS – ‘S’ Success and ‘F’ Failed based on the source/ target load Partitioned Load For every load, the data gets partitioned automatically based on the transaction timestamp in the storage layer (S3) Configuration Sources : NoSQL Database, MongoDB Table Targets : AWS Storage Snaps used : Parent Pipeline: MongoDB - Find, Sort, File Writer, Mapper, Router, Copy, JSON Formatter, Redshift Insert, Redshift Select, Redshift - Multi Execute, S3 File Writer, S3 File Reader, Aggregate, Pipeline Execute S3 Writer Child Pipeline: Mapper, JSON Formatter, S3 File Writer Audit Update Child Pipeline: File Reader, JSON Parser, Mapper, Router, Aggregate, Redshift - Multi Execute Downloads IM_NoSQL_S3_Inc_load.slp (29.9 KB) IM_Nosql_S3_Inc_load_S3writer.slp (4.8 KB) IM_Nosql_S3_Inc_load_Audit_update.slp (12.0 KB)
SriramGopal
7 years ago Place Patterns
5.2KViews
0likes
1Comment
Ingest data from File Server (FTP/sFTP) into AWS Cloud Storage (S3)
Contributed by @SriramGopal from Agilisium Consulting The pipeline is designed to transfer files from a FTP/SFTP server and load them to cloud storage (Amazon S3 in this case). This use case is applicable to Cloud Data Lake initiatives. This pipeline also includes, the Date based Data Partitioning at the Storage layer and Data Validation trail between source and target. Control Table - Tracking The Control table is designed in such a way that it holds the source load type (RDBMS, FTP, API etc.) and the corresponding object name. Each object load will have the load start/end times and the records/ documents processed for every load. The source record fetch count and target table load count is calculated for every run. Based on the status (S-success or F-failure) of the load, automated notifications can be triggered to the technical team. Control Table Attributes: UID – Primary key SOURCE_TYPE – Type of Source RDBMS, API, Social Media, FTP etc TABLE_NAME – Table name or object name. START_DATE – Load start time ENDDATE – Load end time SRC_REC_COUNT – Source record count RGT_REC_COUNT – Target record count STATUS – ‘S’ Success and ‘F’ Failed based on the source/ target load Partitioned Load For every load, the data gets partitioned automatically based on the transaction timestamp in the storage layer (S3) Configuration Sources: FTP/sFTP File Extracts Targets: AWS Storage Snaps used: File Reader, File Writer, Mapper, Router, JSON Formatter, Redshift Insert, Redshift Select, Redshift - Multi Execute, S3 File Writer Downloads IM_FTP_to_S3_load.slp (15.3 KB)
SriramGopal
7 years ago Place Patterns
5KViews
0likes
1Comment
Move data from AWS S3 to Snowflake Data Warehouse
Created by @pkoppishetty The pipeline pattern moves data from AWS S3 to Snowflake Data Warehouse. The pattern contains a Directory Browser (to retrieve the file list from S3), Snowflake Execute, and Snowflake Bulkload Snaps (to load the corresponding files into Snowflake). Configuration File Reader Preview Snowflake Bulkload Output Table Data from Snowflake Select Snap Sources: AWS S3 Targets: Snowflake Snaps used: Snowflake Execute, Directory Broswer, File Reader, Snowflake Bulk Load, Snowflake Select Downloads Integration pattern for moving data from AWS S3 to Snowflake datawarehouse.slp (14.2 KB)
pkoppishetty
8 years ago Place Patterns
4.6KViews
0likes
1Comment
Ingesting multiple AWS S3 files into a database
I have a Amazon S3 bucket containing multiple files that I’d like to extract and read into a database. The files are all .GZ (gzip) files. The file names will change each day but will all have the same format/contents once unzipped. I was thinking it would be like: S3 browser → mapping → S3 reader → JSON Parser → database target But this fails validation at JSON parser step, with: Failure: Cannot parse JSON data, Reason: Unable to create json parser for the given input stream, Illegal character ((CTRL-CHAR, code 31)): only regular white space (\r, \n, \t) is allowed between tokens After the S3 Reader step, I can preview the data and see the list of files that would be imported, but not the contents of the files themselves. Any suggestions for the right way to read in the contents of several files in a S3 bucket at once? Thank you!
Solved
dnewton
3 years ago Place Designing and Running Pipelines
4.2KViews
0likes
4Comments
Redshift bulk load
Hi team, Looking for a suggestion/enhancement to achieve the below senario. when i read a csv file (it might have \n\r (windows) & \n(unix) ). Who to fix this … Today I read the file and when I do the bulk load using redshift bulk load snap and the data gets loaded with the \n\r . How can I escape these characters? When i look at the properties of the snap , i could see the below. but it’s not working.
Solved
makeshwar
5 years ago Place Designing and Running Pipelines
4.2KViews
0likes
5Comments
Ingest data from Salesforce application into AWS Cloud Storage (S3)
Contributed by @SriramGopal from Agilisium Consulting The pipeline is designed to fetch records from CRM application (Salesforce in this case) via API integration and load to cloud storage (Amazon S3 in this case) with partitioning logic. This use case is applicable to Cloud Data Lake initiatives. This pipeline also includes, the Date based Data Partitioning at the Storage layer and Data Validation trail between source and target. Control Table - Tracking The Control table is designed in such a way that it holds the source load type (RDBMS, FTP, API etc.) and the corresponding object name. Each object load will have the load start/end times and the records/ documents processed for every load. The source record fetch count and target table load count is calculated for every run. Based on the status (S-success or F-failure) of the load, automated notifications can be triggered to the technical team. Control Table Attributes: UID – Primary key SOURCE_TYPE – Type of Source RDBMS, API, Social Media, FTP etc TABLE_NAME – Table name or object name. START_DATE – Load start time ENDDATE – Load end time SRC_REC_COUNT – Source record count RGT_REC_COUNT – Target record count STATUS – ‘S’ Success and ‘F’ Failed based on the source/ target load Partitioned Load For every load, the data gets partitioned automatically based on the transaction timestamp in the storage layer (S3) Configuration Sources: Salesforce Account Targets: AWS Storage Snaps used: Salesforce Read, File Reader, File Writer, Mapper, Router, Copy, JSON Formatter, Redshift Insert, Redshift Select, Redshift - Multi Execute, S3 File Writer, S3 File Reader, Aggregate, JSON Parser Downloads IM_API_Salesforce_to_S3_load.slp (31.6 KB)
SriramGopal
7 years ago Place Patterns
3.8KViews
0likes
1Comment