10-17-2019 11:49 AM
Hi Guys , I am trying to read a data from an HTML table. Used REST API to connect to the webpage and able to get the response. However parsing the table is really challenging.
I tried the XML parser , mapper( html.decode ) , java script , JSON parser , etc… but nothing seems to be helping me. Below is the URL ,i am trying to read data from.
Regards,
Naveen
10-18-2019 09:56 AM
You were on the right track with the XML Parser. It can parse the XHTML file into a JSON structure that you can manipulate with other snaps and the expression language. Here’s an example pipeline that parses the table in the URL you sent:
ReadHTMLTable_2019_10_18.slp (13.3 KB)
Note that the XML Parser takes some time to run initially since it needs to download the XHTML DTDs from w3.org and, unfortunately, they have implemented an artificial delay on that download.
The html.decode()
function is for decoding HTML entities (e.g. <
) and not for parsing HTML itself.
10-21-2019 11:01 AM
Excellent… Such a beauty…
I was almost near the last mapper. But not even thought about the array function.
But what you created is very generic and awesome…
Thanks ,Naveen
07-29-2020 12:14 PM
Hi @tstack , For some reason the exact pipeline is not working now suddenly. It throws error on the “Parse XML” snap. I dont see , anything changed in the source as well. Wondering , if you can help on this.
07-29-2020 12:46 PM
Never mind… I just found it… Fixed it