How to download excel file from a webpage?

Question

I would like to create a pipeline which downloads excel file from this webpage&nbsp;https://www.eia.gov/electricity/data/eia861m/#salesrevenue&nbsp;every month. I am currently using web scraping and python to download the excel file. Can we do webscraping or is there any special Snap ?

koryknick · Answer

dawna14&nbsp;- are you looking for only the current Excel from the page, or all of them?

koryknick · Answer

dawna14&nbsp;- I went for the more generic approach.&nbsp; Please download the attached zip file, decompress it, and import the pipeline from the SLP file.&nbsp;

The HTTP Client is simply getting the webpage contents.
In the "Scrape html for excel file links", the match() method is simply using a regular expression to find the link anchors, then filtering the resulting array of strings for those that contain ".xls"
The "Split anchor references" I believe is self-explanatory.
The "Map filePath" snap is again using the match() method to extract out the file reference, which will return two strings in an array, but we want the last one, hence the pop() method
The "Get file" is another HTTP Client snap&nbsp;to get the file contents from the relative path - note that this time, we use a Binary output view on the snap to return the data as a binary stream.
Finally, "Write output file" will write the file out to the SLDB.
Hope this helps!

dawna14 · Answer

koryknick&nbsp;I am looking to download only specific excel link. Pls see attachment&nbsp;

Forum Discussion

How to download excel file from a webpage?

3 Replies

Recent Discussions

Pagination and nextCursor in header

Javascript to promote top level lists

Google Sheets Subscribe questions

Basic string transformations not working

Can we generate XML file in pretty print format using native snapLogic snaps?