aws

11 Topics

Ingest data from SQL Server (RDBMS) to AWS Cloud Storage (S3)
Contributed by @SriramGopal from Agilisium Consulting The pipeline is designed to fetch records on an incremental basis from any RDBMS system and load to cloud storage (Amazon S3 in this case) with partitioning logic. This use case is applicable to Cloud Data Lake initiatives. This pipeline also includes, the Date based Data Partitioning at the Storage layer and Data Validation trail between source and target. Parent Pipeline Control Table check : Gets the last run details from Control table. ETL Process : Fetches the incremental source data based on Control table and loads the data to S3 Control Table write : Updates the latest run data to Control table for tracking S3 Writer Child Pipeline Audit Update Child Pipeline Control Table - Tracking The Control table is designed in such a way that it holds the source load type (RDBMS, FTP, API etc.) and the corresponding object name. Each object load will have the load start/end times and the records/ documents processed for every load. The source record fetch count and target table load count is calculated for every run. Based on the status (S-success or F-failure) of the load, automated notifications can be triggered to the technical team. Control Table Attributes: UID – Primary key SOURCE_TYPE – Type of Source RDBMS, API, Social Media, FTP etc TABLE_NAME – Table name or object name. START_DATE – Load start time ENDDATE – Load end time SRC_REC_COUNT – Source record count RGT_REC_COUNT – Target record count STATUS – ‘S’ Success and ‘F’ Failed based on the source/ target load Partitioned Load For every load, the data gets partitioned automatically based on the transaction timestamp in the storage layer (S3) Configuration Sources : RDBMS Database, SQL Server Table Targets : AWS Storage Snaps used : Parent Pipeline : Sort, File Writer, Mapper, Router, Copy, JSON Formatter, Redshift Insert, Redshift Select, Redshift - Multi Execute, S3 File Writer, S3 File Reader, Aggregate, Pipeline Execute S3 Writer Child Pipeline : Mapper, JSON Formatter, S3 File Writer Audit Update Child Pipeline : File Reader, JSON Parser, Mapper, Router, Aggregate, Redshift - Multi Execute Downloads IM_RDBMS_S3_Inc_load.slp (43.6 KB) IM_RDBMS_S3_Inc_load_S3writer.slp (12.2 KB) IM_RDBMS_S3_Inc_load_Audit_update.slp (18.3 KB)
SriramGopal
7 years ago Place Patterns
7.2KViews
1like
3Comments
Ingesting multiple AWS S3 files into a database
I have a Amazon S3 bucket containing multiple files that I’d like to extract and read into a database. The files are all .GZ (gzip) files. The file names will change each day but will all have the same format/contents once unzipped. I was thinking it would be like: S3 browser → mapping → S3 reader → JSON Parser → database target But this fails validation at JSON parser step, with: Failure: Cannot parse JSON data, Reason: Unable to create json parser for the given input stream, Illegal character ((CTRL-CHAR, code 31)): only regular white space (\r, \n, \t) is allowed between tokens After the S3 Reader step, I can preview the data and see the list of files that would be imported, but not the contents of the files themselves. Any suggestions for the right way to read in the contents of several files in a S3 bucket at once? Thank you!
Solved
dnewton
3 years ago Place Designing and Running Pipelines
4.2KViews
0likes
4Comments
Connecting to DocumentDB in AWS with shared .pem
Hello, I am trying to create a MongoDB account to connect to DocumentDB in aws. AWS supplied a shared .pem for RDS instances (https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL.html) , but I am not sure how to use that with AWS secrets to create the account. There are fields to KeyStore and TrustStore, so I presume I need to convert the .pem to a different format. In past experience those stores are installed on the node(s) doing the connecting, how would I manage that in a CloudPlex?Is there any guidance on how to proceed? TIA -Liam
loafob
6 years ago Place Snap Packs
4KViews
0likes
2Comments
AWS S3 multi-part file upload?
The S3 File Writer Snap doesn’t appear to support multi-part file uploads. We have large files to upload and want to improve performance. Has anyone been able to implement multi-part uploads to S3 in SnapLogic or have any tuning tricks for improving the performance of the S3 File Writer Snap?
PSAmmirata
9 years ago Place Designing and Running Pipelines
3.7KViews
0likes
2Comments
Snaplogic Redshift Copy Command
Hi Team, When I read data from a file/csv and do a bulk load, into redshift I know that at the backend it’s trying to copy the data and tries to do the bulk load. Here…I need to know what was the full set of options the Snaplogic code used for the Redshift Copy command at its backend. Where I can get that details… Thanks in advance. Makesh
Solved
makeshwar
5 years ago Place Designing and Running Pipelines
2.8KViews
0likes
2Comments
AWS Gov Cloud and the S3 File Reader
My company uses sub accounts under a master account to control access and track resources in AWS Gov Cloud. I am trying to get an account configured for the S3 File Reader that works in this configuration. Seems simple enough, I specify my access key and secret key and set the Cross Account IAM Role’s Role ARN field but the validation is failing: Failed to validate account: The security token included in the request is invalid. (Service: AWSSecurityTokenService; Status Code: 403; Error Code: InvalidClientTokenId; Request ID: 4961a29f-a875-4c69-9665-e640610bc840) (Reason: No reason provided by the snap; Resolution: No resolution provided by the snap) On my local machine, my ~/.aws/config file looks something like this: [default] region = us-gov-west-1 [profile engineer] role_arn = arn:aws-us-gov:iam::############:role/Sandbox_Administrator source_profile = default region = us-gov-west-1 and my ~/.aws/credentials file looks something like this: [default] aws_access_key_id = <redacted> aws_secret_access_key = <redacted> When I run AWS CLI commands, I have to add “–profile engineer” to the command line but everything works properly. Any clues as to what I need to do to make this work with the S3 File Reader?
rpatrick00
6 years ago Place Snap Packs
2.8KViews
0likes
1Comment
How to enable Server-side Encryption with IAM Role support for Amazon S3
To enable Server-side Encryption support in UAT, the following steps must be followed: Include the following directive in global.properties jcc.jvm_options = -DIAM_CREDENTIAL_FOR_S3=TRUE Note : JCC must be restarted if properties file is updated In Designer or Manager, create a new S3 Account as follows : a) Leave Access-key ID and Secret key properties blank b) Enable the Server-side encryption checkbox c) Enable the IAM role checkbox 3 Bind the account created in Step 2 to all applicable Snaps writing to S3. NOTE: This will only work if the IAM role assigned to the JCC is assigned the correct role at the time it was provisioned. We do not support referencing IAM roles otherwise. If server-side encryption is not required, an account is not necessary.
hpv
9 years ago Place Designing and Running Pipelines
2.5KViews
0likes
0Comments
Updated S3 Snap Pack and Selecting Accounts
I’m playing around with some of the new S3 snaps that were recently released. They are not allowing me to use existing S3 accounts. Do we have to create new S3 accounts for these snaps every time? For instance, when I open the S3 Upload snap and click on the accounts tab, when I click to add an existing account, nothing shows up, but I know that I have S3 accounts that should be available in my project space and project. When I click on the add account button and click continue, the dialogue box opens to add a new account. I’m on the latest snap pack version.
alex_panganiban
4 years ago Place Designing and Running Pipelines
2.4KViews
0likes
2Comments
How to read and delete specific AWS SQS message in snaplogic
Hi, We have a requirement to read specific SQS message (by passing sqs message id) and delete it after entire pipeline process gets completed. When I use the SQS Consumer snap by passing sqs message id, it is reading all the messages from Queue instead reading only particular message and SQS acknowledge snap deleting all messages. Can you please give us a example pipeline to read only specific message from AWS SQS queue and delete it after pipeline completes its process. Thank you!
manchis
5 years ago Place Designing and Running Pipelines
2.3KViews
0likes
0Comments
Generate S3 signature in snaplogic
I am trying to generate S3 signature for pre-signed URL, expression below Digest.sha256(Digest.sha256(Digest.sha256(Digest.sha256(Digest.sha256(“AWS4” + “>SecretKey<”,“20180226”),“us-east-1”),“s3”),“aws4_request”),encodeURIComponent(“AWS4-HMAC-SHA256\n20180226T000000Z\n20180226/us-east-1/s3/aws4_request\n”+Digest.sha256(“GET\n/gis-integration-snaplogic-expressions/MaterialMasterToETQ.expr\nX-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=”>AccessKey<“%2F20180226%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20180226T000000Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host\nhost:s3.amazonaws.com\n\nhost\nUNSIGNED-PAYLOAD”))) But I receive error from S3 saying invalid signature, any help on this is greatly appreciated.
psadasivam
8 years ago Place Designing and Running Pipelines
2KViews
0likes
0Comments