cancel
Showing results for 
Search instead for 
Did you mean: 

Best way to filter out zip file __MACOSX folder entries

hkaplan
Former Employee

I have a zip file I need SnapLogic to source that was created on a Mac using Apple’s compress utility.
To read the zip file, I used a ZipFile Reader snap which had no problem reading the zip file.
Next I connected a XML parser snap to the ZipFile Reader snap and ran into an issue about the XML data not being well formed.
“Failure: Failed to convert xml to json, Reason: Invalid UTF-8 start byte 0xad (at char #38, byte #-1), Resolution: Please check if the xml data is well formed”

When I do a ‘view data’ on the ZipFile Reader’s output, I see there is an extra 223 byte __MACOSX entry for every file contained in the zip file.

What is the best way to remove these extra _MACOSX entries from be passed to the XML parser snap?

1 REPLY 1

cstewart
Former Employee

To solve this I created a pipeline with a ZipFile Reader, followed by a Binary Router, to select only the “content” files rather than the metadata files in this file, use the filter:
!$[‘content-location’].startsWith(“__”)
This then gives you only the output files you want.

The output from the Zipfile Reader is Binary, and the files are streamed to a downstream Snap. If you use the Binary Router Snap next, you can choose to route the ‘content’ files to one output, this might be using the file name, the content length or some other field(s) to use in the selection criteria. For those ‘files’ you want to discard, if you leave the second output unterminated, it will just get discarded. If you really want to be clean, then use a file writer on that output and write it to /dev/null.