I have a Amazon S3 bucket containing multiple files that I’d like to extract and read into a database. The files are all .GZ (gzip) files. The file names will change each day but will all have the same format/contents once unzipped.
I was thinking it would be like:
S3 browser → mapping → S3 reader → JSON Parser → database target
But this fails validation at JSON parser step, with:
Failure: Cannot parse JSON data, Reason: Unable to create json parser for the given input stream, Illegal character ((CTRL-CHAR, code 31)): only regular white space (\r, \n, \t) is allowed between tokens
After the S3 Reader step, I can preview the data and see the list of files that would be imported, but not the contents of the files themselves.
Any suggestions for the right way to read in the contents of several files in a S3 bucket at once? Thank you!
After the S3 Reader, you could use a Decompress snap to decompress the files from GZIP.
Then you should be able to use the JSON Parser for all JSON files.
Let me know if that helps.
Thank you! Getting a little closer. So my flow now looks like:
S3 browser → mapping → S3 reader → JSON Parser → CSV Formatter → File Writer
It validates OK, but a zero-byte file gets written (I’m outputting just to the snaplogic storage area).
If I just try to write out the files after decompressing, it too validates OK but gets Java errors at runtime.
S3 Browser → Mapper → S3 File Reader → Decompress → File Writer
Thank you for any additional tips you may have.
Or if the zip file compose of multiple json files… you can then use the zip reader snap
S3 Browser → ZipFile Reader *(use the output $path from S3 Browser and set to the File) → JSON Parser → Prep Document (mapper) → To DB
Directory Browser → ZipFile Reader (use the output $Path from the Directory Browser and set File) → JSON Parser → Prep Document (mapper) → To DB