Ingesting multiple AWS S3 files into a database
I have a Amazon S3 bucket containing multiple files that I’d like to extract and read into a database. The files are all .GZ (gzip) files. The file names will change each day but will all have the same format/contents once unzipped. I was thinking it would be like: S3 browser → mapping → S3 reader → JSON Parser → database target But this fails validation at JSON parser step, with: Failure: Cannot parse JSON data, Reason: Unable to create json parser for the given input stream, Illegal character ((CTRL-CHAR, code 31)): only regular white space (\r, \n, \t) is allowed between tokens After the S3 Reader step, I can preview the data and see the list of files that would be imported, but not the contents of the files themselves. Any suggestions for the right way to read in the contents of several files in a S3 bucket at once? Thank you!Solved4.2KViews0likes4CommentsHow to move files on HDFS?
My scenario is I need to create some subfolders on HDFS, then zip all of them, and after that, I want to clean these subfolders to release space so next time I will have a “clean” environment before I create any files/folders. However, the [File Delete] and [File Operation] snap seems not support the HDFS location. Is there any way to move/rename/delete files/folders on HDFS? Thanks, Elaine1.7KViews0likes0CommentsFile Operation - Move gives error after file is moved
Hi, I am trying to use File Operation snap with Move operation to move file with Ground Plex. It works fine by functionality and moves the file.But it also fails with error that source file could not be found. Has anyone seen similar behavior ? Do I need to do any additional configuration to resolve this error. Complete Error stack : Failed to move from file:///D:/data/a.xlsx to file:///D:/data/done/b2019-03-13T183215.209Z_.xlsx Resolution: Please address the reported issue. Reason: File not found: file:///D:/data/a.xlsx Hide Details… Move file to done folder[a31dcafa-b962-4268-b075-14386125e559 – 36115ba9-423b-4ddd-aa9f-302ee867cc2d] com.snaplogic.snap.api.SnapDataException: Failed to move from file:///D:/data/a.xlsx to file:///D:/data/done/b2019-03-13T183215.209Z_.xlsx at com.snaplogic.snaps.binary.FileOperation.writeError(FileOperation.java:270) at com.snaplogic.snaps.binary.FileOperation.process(FileOperation.java:241) at com.snaplogic.snap.api.ExecutionUtil.process(ExecutionUtil.java:95) at com.snaplogic.snap.api.ExecutionUtil.execute(ExecutionUtil.java:107) at com.snaplogic.snap.api.ExecutionUtil.execute(ExecutionUtil.java:75) at com.snaplogic.snap.api.SimpleSnap.execute(SimpleSnap.java:67) at com.snaplogic.cc.snap.common.SnapRunnableImpl.executeSnap(SnapRunnableImpl.java:773) at com.snaplogic.cc.snap.common.SnapRunnableImpl.execute(SnapRunnableImpl.java:519) at com.snaplogic.cc.snap.common.SnapRunnableImpl.doRun(SnapRunnableImpl.java:839) at com.snaplogic.cc.snap.common.SnapRunnableImpl.access$000(SnapRunnableImpl.java:115) at com.snaplogic.cc.snap.common.SnapRunnableImpl$1.run(SnapRunnableImpl.java:362) at com.snaplogic.cc.snap.common.SnapRunnableImpl$1.run(SnapRunnableImpl.java:358) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at com.snaplogic.cc.snap.common.SnapRunnableImpl.call(SnapRunnableImpl.java:357) at com.snaplogic.cc.snap.common.SnapRunnableImpl.call(SnapRunnableImpl.java:115) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.nio.file.NoSuchFileException: D:\data\a.xlsx at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at sun.nio.fs.WindowsFileCopy.move(Unknown Source) at sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at java.nio.file.Files.move(Unknown Source) at com.snaplogic.snaps.binary.FileOperation.moveOrCopy(FileOperation.java:289) at com.snaplogic.snaps.binary.FileOperation.process(FileOperation.java:221) … 21 more Error Fingerprint[0] = efp:com.snaplogic.snaps.binary.Abevv5GX Error Fingerprint[1] = efp:sun.nio.fs.MtocHpFA Move file to done folder[a31dcafa-b962-4268-b075-14386125e559 – 36115ba9-423b-4ddd-aa9f-302ee867cc2d] java.lang.Throwable at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Unknown Source) at java.lang.Throwable.(Unknown Source) at org.apache.logging.log4j.core.impl.Log4jLogEvent.calcLocation(Log4jLogEvent.java:546) at org.apache.logging.log4j.core.impl.Log4jLogEvent.getSource(Log4jLogEvent.java:537) at com.snaplogic.cc.log.JsonLogLayout.writeFileInfo(JsonLogLayout.java:136) at com.snaplogic.cc.log.JsonLogLayout.writeAllFields(JsonLogLayout.java:109) at com.snaplogic.cc.log.JsonLogLayout.toSerializable(JsonLogLayout.java:91) at com.snaplogic.cc.log.JsonLogLayout.toSerializable(JsonLogLayout.java:43) at org.apache.logging.log4j.core.layout.AbstractStringLayout.toByteArray(AbstractStringLayout.java:148) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:112) at org.apache.logging.log4j.core.appender.RollingRandomAccessFileAppender.append(RollingRandomAccessFileAppender.java:98) at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:152) at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:125) at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:116) at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84) at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:390) at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:378) at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:362) at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:352) at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:63) at org.apache.logging.log4j.core.Logger.logMessage(Logger.java:147) at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:1016) at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:964) at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:178) at com.snaplogic.cc.snap.common.SnapRunnableImpl.abortViews(SnapRunnableImpl.java:1013) at com.snaplogic.cc.snap.common.SnapRunnableImpl.failSnap(SnapRunnableImpl.java:992) at com.snaplogic.cc.snap.common.SnapRunnableImpl.handleException(SnapRunnableImpl.java:975) at com.snaplogic.cc.snap.common.SnapRunnableImpl.doRun(SnapRunnableImpl.java:860) at com.snaplogic.cc.snap.common.SnapRunnableImpl.access$000(SnapRunnableImpl.java:115) at com.snaplogic.cc.snap.common.SnapRunnableImpl$1.run(SnapRunnableImpl.java:362) at com.snaplogic.cc.snap.common.SnapRunnableImpl$1.run(SnapRunnableImpl.java:358) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at com.snaplogic.cc.snap.common.SnapRunnableImpl.call(SnapRunnableImpl.java:357) at com.snaplogic.cc.snap.common.SnapRunnableImpl.call(SnapRunnableImpl.java:115) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Error Fingerprint[0] = efp:java.lang.f8-Tp7g_ Thanks, Hitesh5.5KViews0likes5CommentsFile Transfer from SFTP to SMB server or any other two file sharing systems
Submitted by @stodoroska from Interworks File Transfer from SFTP to SMB server or any other two file sharing systems with additional validations for file browsing such as checking if only valid file names are consumed from the pipeline, with valid dates and etc. Parent Pipeline Child Pipeline Configuration In the parameters, you need to configure: the file names you want to filter out from the directory, the pattern, source and target systems, and accounts. Also, the validation rules are specific for this pipeline such as filtering only specific dates. This can be changed anytime. Sources: File Targets: SMB server or any other file sharing system Snaps used: Directory Browser, Pipeline Execute, Group By Field, Mapper, JSON Splitter, Router, Filter, File Reader, File Writer, File Operation Downloads FileTransfer_SFTP2SMB.slp (18.7 KB) CheckUniqueFileNames.slp (4.7 KB)4.6KViews0likes0Comments