Extracting file names from html

Hi all,

I have a requirement to fetch files from a rest api service (which is http url). The output of rest-get snap is html ,below is the response we have got Rest Get snap but unable to filter out only filenames from it.

image.png1756x677 136 KB

Is there any way to extract the only filenames from the above?


Hi, it’s looking like your rest call return raw data. Can you share your json file /entity ?

HI @Supratim,

Below is the response we are receiving from Rest_Get, entity contains the html content from which we need to extract only filenames .

Rest_Get.txt (964.6 KB)


HI ,

Found solution. splitting the html line by line in mapper and then using json splitter .


Other possible solutions:

  1. maybe a json mediaType can be requested from the web service (e.g., Accept or Content-Type headers)

  2. the html might be processed as XML:

    • map the “entity” field to $[‘content’] and then run through Doc->Bin Snap then XML Parser Snap.

I would prefer #1 over #2 as a lot (most?) of HTML is not well-formed.


1 Like