cancel
Showing results for 
Search instead for 
Did you mean: 

Parse HTML file and map the contents therein

Lidiya_Thomas
New Contributor II

Hi all,

I am trying to read a html file using the file reader where the file is in .txt format. I cannot use the third party libraries in the script snap to parse it because I am on cloud plex. Please let me know if anyone knows as to how this can be done.

Regards,
Lidiya

7 REPLIES 7

Hi @Lidiya_Thomas ,

Thanks for the sample input.
What would you like to have on output?

Hi @marjan.karafiloski thanks for helping, I want all the data in the form of json. Please do let me know if you have got any ideas on how that can be possible.

bojanvelevski
Valued Contributor

Hey @Lidiya_Thomas,

I think this is very complex request, especially if the HTML contains various scripts and everything. You can try and investigate some APIs. I did check one, manually on the web page and it works:

HTML 2 JSON

Keep in mind to read about security terms and policies.

Other way would be to use a script. Check out this link which leads to a Python (Jython) library documentation for parsing HTML:

Jython HTML Parser

I surely hope that you’ll find this helpful,
Bojan