Ingesting multiple AWS S3 files into a database
I have a Amazon S3 bucket containing multiple files that I’d like to extract and read into a database. The files are all .GZ (gzip) files. The file names will change each day but will all have the same format/contents once unzipped.
I was thinking it would be like:
S3 browser → mapping → S3 reader → JSON Parser → database target
But this fails validation at JSON parser step, with:
Failure: Cannot parse JSON data, Reason: Unable to create json parser for the given input stream, Illegal character ((CTRL-CHAR, code 31)): only regular white space (\r, \n, \t) is allowed between tokens
After the S3 Reader step, I can preview the data and see the list of files that would be imported, but not the contents of the files themselves.
Any suggestions for the right way to read in the contents of several files in a S3 bucket at once? Thank you!
Or if the zip file compose of multiple json files… you can then use the zip reader snap
S3 Browser → ZipFile Reader *(use the output $path from S3 Browser and set to the File) → JSON Parser → Prep Document (mapper) → To DB
or
Directory Browser → ZipFile Reader (use the output $Path from the Directory Browser and set File) → JSON Parser → Prep Document (mapper) → To DB