Ingesting multiple AWS S3 files into a database
I have a Amazon S3 bucket containing multiple files that I’d like to extract and read into a database. The files are all .GZ (gzip) files. The file names will change each day but will all have the same format/contents once unzipped. I was thinking it would be like: S3 browser → mapping → S3 reader → JSON Parser → database target But this fails validation at JSON parser step, with: Failure: Cannot parse JSON data, Reason: Unable to create json parser for the given input stream, Illegal character ((CTRL-CHAR, code 31)): only regular white space (\r, \n, \t) is allowed between tokens After the S3 Reader step, I can preview the data and see the list of files that would be imported, but not the contents of the files themselves. Any suggestions for the right way to read in the contents of several files in a S3 bucket at once? Thank you!Solved4.2KViews0likes4CommentsUpdated S3 Snap Pack and Selecting Accounts
I’m playing around with some of the new S3 snaps that were recently released. They are not allowing me to use existing S3 accounts. Do we have to create new S3 accounts for these snaps every time? For instance, when I open the S3 Upload snap and click on the accounts tab, when I click to add an existing account, nothing shows up, but I know that I have S3 accounts that should be available in my project space and project. When I click on the add account button and click continue, the dialogue box opens to add a new account. I’m on the latest snap pack version.2.4KViews0likes2CommentsSnaplogic Redshift Copy Command
Hi Team, When I read data from a file/csv and do a bulk load, into redshift I know that at the backend it’s trying to copy the data and tries to do the bulk load. Here…I need to know what was the full set of options the Snaplogic code used for the Redshift Copy command at its backend. Where I can get that details… Thanks in advance. MakeshSolved2.8KViews0likes2CommentsHow to read and delete specific AWS SQS message in snaplogic
Hi, We have a requirement to read specific SQS message (by passing sqs message id) and delete it after entire pipeline process gets completed. When I use the SQS Consumer snap by passing sqs message id, it is reading all the messages from Queue instead reading only particular message and SQS acknowledge snap deleting all messages. Can you please give us a example pipeline to read only specific message from AWS SQS queue and delete it after pipeline completes its process. Thank you!2.2KViews0likes0CommentsConnecting to DocumentDB in AWS with shared .pem
Hello, I am trying to create a MongoDB account to connect to DocumentDB in aws. AWS supplied a shared .pem for RDS instances (https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL.html) , but I am not sure how to use that with AWS secrets to create the account. There are fields to KeyStore and TrustStore, so I presume I need to convert the .pem to a different format. In past experience those stores are installed on the node(s) doing the connecting, how would I manage that in a CloudPlex?Is there any guidance on how to proceed? TIA -Liam4KViews0likes2CommentsAWS Gov Cloud and the S3 File Reader
My company uses sub accounts under a master account to control access and track resources in AWS Gov Cloud. I am trying to get an account configured for the S3 File Reader that works in this configuration. Seems simple enough, I specify my access key and secret key and set the Cross Account IAM Role’s Role ARN field but the validation is failing: Failed to validate account: The security token included in the request is invalid. (Service: AWSSecurityTokenService; Status Code: 403; Error Code: InvalidClientTokenId; Request ID: 4961a29f-a875-4c69-9665-e640610bc840) (Reason: No reason provided by the snap; Resolution: No resolution provided by the snap) On my local machine, my ~/.aws/config file looks something like this: [default] region = us-gov-west-1 [profile engineer] role_arn = arn:aws-us-gov:iam::############:role/Sandbox_Administrator source_profile = default region = us-gov-west-1 and my ~/.aws/credentials file looks something like this: [default] aws_access_key_id = <redacted> aws_secret_access_key = <redacted> When I run AWS CLI commands, I have to add “–profile engineer” to the command line but everything works properly. Any clues as to what I need to do to make this work with the S3 File Reader?2.8KViews0likes1CommentIngest data from SQL Server (RDBMS) to AWS Cloud Storage (S3)
Contributed by @SriramGopal from Agilisium Consulting The pipeline is designed to fetch records on an incremental basis from any RDBMS system and load to cloud storage (Amazon S3 in this case) with partitioning logic. This use case is applicable to Cloud Data Lake initiatives. This pipeline also includes, the Date based Data Partitioning at the Storage layer and Data Validation trail between source and target. Parent Pipeline Control Table check : Gets the last run details from Control table. ETL Process : Fetches the incremental source data based on Control table and loads the data to S3 Control Table write : Updates the latest run data to Control table for tracking S3 Writer Child Pipeline Audit Update Child Pipeline Control Table - Tracking The Control table is designed in such a way that it holds the source load type (RDBMS, FTP, API etc.) and the corresponding object name. Each object load will have the load start/end times and the records/ documents processed for every load. The source record fetch count and target table load count is calculated for every run. Based on the status (S-success or F-failure) of the load, automated notifications can be triggered to the technical team. Control Table Attributes: UID – Primary key SOURCE_TYPE – Type of Source RDBMS, API, Social Media, FTP etc TABLE_NAME – Table name or object name. START_DATE – Load start time ENDDATE – Load end time SRC_REC_COUNT – Source record count RGT_REC_COUNT – Target record count STATUS – ‘S’ Success and ‘F’ Failed based on the source/ target load Partitioned Load For every load, the data gets partitioned automatically based on the transaction timestamp in the storage layer (S3) Configuration Sources : RDBMS Database, SQL Server Table Targets : AWS Storage Snaps used : Parent Pipeline : Sort, File Writer, Mapper, Router, Copy, JSON Formatter, Redshift Insert, Redshift Select, Redshift - Multi Execute, S3 File Writer, S3 File Reader, Aggregate, Pipeline Execute S3 Writer Child Pipeline : Mapper, JSON Formatter, S3 File Writer Audit Update Child Pipeline : File Reader, JSON Parser, Mapper, Router, Aggregate, Redshift - Multi Execute Downloads IM_RDBMS_S3_Inc_load.slp (43.6 KB) IM_RDBMS_S3_Inc_load_S3writer.slp (12.2 KB) IM_RDBMS_S3_Inc_load_Audit_update.slp (18.3 KB)7.2KViews1like3CommentsGenerate S3 signature in snaplogic
I am trying to generate S3 signature for pre-signed URL, expression below Digest.sha256(Digest.sha256(Digest.sha256(Digest.sha256(Digest.sha256(“AWS4” + “>SecretKey<”,“20180226”),“us-east-1”),“s3”),“aws4_request”),encodeURIComponent(“AWS4-HMAC-SHA256\n20180226T000000Z\n20180226/us-east-1/s3/aws4_request\n”+Digest.sha256(“GET\n/gis-integration-snaplogic-expressions/MaterialMasterToETQ.expr\nX-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=”>AccessKey<“%2F20180226%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20180226T000000Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host\nhost:s3.amazonaws.com\n\nhost\nUNSIGNED-PAYLOAD”))) But I receive error from S3 saying invalid signature, any help on this is greatly appreciated.1.9KViews0likes0CommentsAWS S3 multi-part file upload?
The S3 File Writer Snap doesn’t appear to support multi-part file uploads. We have large files to upload and want to improve performance. Has anyone been able to implement multi-part uploads to S3 in SnapLogic or have any tuning tricks for improving the performance of the S3 File Writer Snap?3.7KViews0likes2Comments