I would like to create a pipeline which downloads excel file from this webpage https://www.eia.gov/electricity/data/eia861m/#salesrevenue every month. I am currently using web scraping and python to download the excel file. Can we do webscraping or is there any special Snap ?
dawna14 - I went for the more generic approach. Please download the attached zip file, decompress it, and import the pipeline from the SLP file.
The HTTP Client is simply getting the webpage contents.
In the "Scrape html for excel file links", the match() method is simply using a regular expression to find the link anchors, then filtering the resulting array of strings for those that contain ".xls"
The "Split anchor references" I believe is self-explanatory.
The "Map filePath" snap is again using the match() method to extract out the file reference, which will return two strings in an array, but we want the last one, hence the pop() method
The "Get file" is another HTTP Client snap to get the file contents from the relative path - note that this time, we use a Binary output view on the snap to return the data as a binary stream.
Finally, "Write output file" will write the file out to the SLDB.