Load a zip file from a download url link


Is it possible to extract data from the download link - http://naptan.app.dft.gov.uk/DataRequest/Naptan.ashx?format=csv

A file reader snap or a zip file reader only recognises this as a file as an asp.net file - Naptan.ashx, however clicking on the link downloads a zipped file containing 16 csv files.

Thank you for your help

You should be able to plug the URL into a zip file reader. You may be hitting a limitation of preview such that there are some large files after you “unzip”. Do you have control over the endpoint such that you can have “smaller” files after the “unzip”? If so, you can make a smaller sample file so you can build your pipeline.

What you can also do is make your own smaller zip file and upload locally to SnapLogic for development and build your pipeline. When you are ready, you can swap out the URL for your naptan endpoint and then execute. Everything should work at that point.

1 Like

tlui, thank you for responding. I may have misunderstood your suggestions but using the url in a zip file reader results in the error -

Failure: Error decompressing zip file http://naptan.app.dft.gov.uk/DataRequest/Naptan.ashx?format=csv, Reason: Error decompressing zip file, Resolution: Please check if the file is a proper zip file

I added an error view to a ZipFile Read and it states “Max bytes written for preview”, so it sounds like the file is too big to work with (downloading it directly, the zip file is 30.4 MB, extracted it’s 197.1 MB). I’m checking with Dev to see if we have any limits set on what can be used.

1 Like

Take a look at this pipeline, which I created to enable you to download the file, extract the contents from the zip file, and then remove the downloaded Zipfile.
ZipFile_2018_08_08.slp (13.2 KB)

1 Like

Thank you all @tlui, @dmiller and @cstewart.

The pipeline is great. Much appreciated. Have been able to get all my data through without issues now.

Note for readers from the future: the snap peeks at the first few bytes of the file to determine if it’s a .zip or .7z file. It doesn’t look at the file extension. Those are standard values and it’s not an exaggeration to say that if those bytes aren’t set then it’s not a .zip or .7z file. Since some applications that support additional formats will quietly use the correct decoder it’s possible that you think you have a .zip file but it’s actually something else.

When in doubt it’s best to create an empty archive, or one with just a single entry, and attach it your question. That allows us to verify how the third party library we use is identifying the file. There are some proprietary extensions the zip format, e.g., to support strong encryption, and it’s possible that the library is returning an unrecognized MIME type. We won’t be able to read the encrypted entries but we could provide a more meaningful error message in this case.