cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Covert HTML to Plain Text

Drew
New Contributor

Iโ€™m using the Email Reader snap and finding that the messages that it is pulling only have an htmlBody and no textBody. I need the messages properly formatted in plane text. Does SnapLogic have an easy way to do this?

I have found and worked with several RegExโ€™s to strip out the HTML tags but the format that is left is not the same in most cases and can make the plane text messy.

Other options?

8 REPLIES 8

When writing a post in the Commuity, there is a Preformatted text option you can use with scripts to prevent it converting.


Diane Miller
Community Manager

dcarlson
New Contributor

Yup, thatโ€™s what I did on my 2nd attempt.

For anyone else experiencing this problemโ€ฆ this is the expression I wrote for our library to strip (at least) the HTML that I am seeing; might need more tags added to it.

{
stripHTML: x => x.replaceAll(x.slice(x.indexOf("<b"),x.indexOf(">",x.indexOf("<b"))+1),"").replaceAll("</b>","").replaceAll(x.slice(x.indexOf("<span"),x.indexOf(">",x.indexOf("<span"))+1),"").replaceAll("</span>","").replaceAll(x.slice(x.indexOf("<i"),x.indexOf(">",x.indexOf("<i"))+1),"").replaceAll("</i>","").replaceAll(x.slice(x.indexOf("<u"),x.indexOf(">",x.indexOf("<u"))+1),"").replaceAll("</u>","").replaceAll(x.slice(x.indexOf("<br"),x.indexOf(">",x.indexOf("<br"))+1),"").replaceAll("</br>","").replaceAll("<br>","").replaceAll(x.slice(x.indexOf("<font"),x.indexOf(">",x.indexOf("<font"))+1),"").replaceAll("</font>","").replaceAll(x.slice(x.indexOf("<html"),x.indexOf(">",x.indexOf("<html"))+1),"").replaceAll("</html>","").replaceAll(x.slice(x.indexOf("<body"),x.indexOf(">",x.indexOf("<body"))+1),"").replaceAll("</body>","").replaceAll(x.slice(x.indexOf("<head"),x.indexOf(">",x.indexOf("<head"))+1),"").replaceAll("</head>","").replaceAll(x.slice(x.indexOf("<meta"),x.indexOf(">",x.indexOf("<meta"))+1),"").replaceAll("</meta>","").replaceAll(x.slice(x.indexOf("<style"),x.indexOf(">",x.indexOf("<style"))+1),"").replaceAll("</style>","").replaceAll(x.slice(x.indexOf("<!"),x.indexOf(">",x.indexOf("<!"))+1),"").replaceAll("</!>","").replaceAll(x.slice(x.indexOf("<div"),x.indexOf(">",x.indexOf("<div"))+1),"").replaceAll("</div>","").replaceAll(x.slice(x.indexOf("<p"),x.indexOf(">",x.indexOf("<p"))+1),"").replaceAll(x.slice(x.indexOf("<div"),x.indexOf(">",x.indexOf("<div"))+1),"").replaceAll("</div>","").replaceAll("</p>","").replaceAll("//r","").replaceAll("<div>","").trim()
}

koryknick
Employee
Employee

@dcarlson - you could also just use a simple regular expression to strip away all the html tags. Assuming that your html string is in a single variable called โ€œcontentโ€:

$content.replace(/<(.|\n)*?>/ig,'')

Shucksโ€ฆ vaidyarm had it right all this time. I had tried that with .replaceAll() and it didnโ€™t work; but with .replace() like you both saidโ€ฆ works. I would not have expected to use .replace() to strip ALL the tags. Live and learn.
Thank you both - Davey