Forum Discussion

Lidiya_Thomas's avatar
Lidiya_Thomas
New Contributor II
4 years ago

Parse HTML file and map the contents therein

Hi all,

I am trying to read a html file using the file reader where the file is in .txt format. I cannot use the third party libraries in the script snap to parse it because I am on cloud plex. Please let me know if anyone knows as to how this can be done.

Regards,
Lidiya

7 Replies

  • bojanvelevski's avatar
    bojanvelevski
    Valued Contributor

    Hey @Lidiya_Thomas,

    I think this is very complex request, especially if the HTML contains various scripts and everything. You can try and investigate some APIs. I did check one, manually on the web page and it works:

    HTML 2 JSON

    Keep in mind to read about security terms and policies.

    Other way would be to use a script. Check out this link which leads to a Python (Jython) library documentation for parsing HTML:

    Jython HTML Parser

    I surely hope that you’ll find this helpful,
    Bojan

    • Lidiya_Thomas's avatar
      Lidiya_Thomas
      New Contributor II

      Thank you, but I had tried this approach and it doesn’t seem to work in my case.