cancel
Showing results for 
Search instead for 
Did you mean: 

How to download excel file from a webpage?

dawna14
New Contributor

I would like to create a pipeline which downloads excel file from this webpage https://www.eia.gov/electricity/data/eia861m/#salesrevenue every month. I am currently using web scraping and python to download the excel file. Can we do webscraping or is there any special Snap ?

3 REPLIES 3

koryknick
Employee
Employee

@dawna14 - are you looking for only the current Excel from the page, or all of them?

@koryknick I am looking to download only specific excel link. Pls see attachment 

koryknick
Employee
Employee

@dawna14 - I went for the more generic approach.  Please download the attached zip file, decompress it, and import the pipeline from the SLP file. 

koryknick_0-1701524272646.png

The HTTP Client is simply getting the webpage contents.

In the "Scrape html for excel file links", the match() method is simply using a regular expression to find the link anchors, then filtering the resulting array of strings for those that contain ".xls"

The "Split anchor references" I believe is self-explanatory.

The "Map filePath" snap is again using the match() method to extract out the file reference, which will return two strings in an array, but we want the last one, hence the pop() method

The "Get file" is another HTTP Client snap to get the file contents from the relative path - note that this time, we use a Binary output view on the snap to return the data as a binary stream.

Finally, "Write output file" will write the file out to the SLDB.

Hope this helps!