cancel
Showing results for 
Search instead for 
Did you mean: 

Covert HTML to Plain Text

Drew
New Contributor

I’m using the Email Reader snap and finding that the messages that it is pulling only have an htmlBody and no textBody. I need the messages properly formatted in plane text. Does SnapLogic have an easy way to do this?

I have found and worked with several RegEx’s to strip out the HTML tags but the format that is left is not the same in most cases and can make the plane text messy.

Other options?

8 REPLIES 8

When writing a post in the Commuity, there is a Preformatted text option you can use with scripts to prevent it converting.


Diane Miller

dcarlson
New Contributor

Yup, that’s what I did on my 2nd attempt.

For anyone else experiencing this problem… this is the expression I wrote for our library to strip (at least) the HTML that I am seeing; might need more tags added to it.

{
stripHTML: x => x.replaceAll(x.slice(x.indexOf("<b"),x.indexOf(">",x.indexOf("<b"))+1),"").replaceAll("</b>","").replaceAll(x.slice(x.indexOf("<span"),x.indexOf(">",x.indexOf("<span"))+1),"").replaceAll("</span>","").replaceAll(x.slice(x.indexOf("<i"),x.indexOf(">",x.indexOf("<i"))+1),"").replaceAll("</i>","").replaceAll(x.slice(x.indexOf("<u"),x.indexOf(">",x.indexOf("<u"))+1),"").replaceAll("</u>","").replaceAll(x.slice(x.indexOf("<br"),x.indexOf(">",x.indexOf("<br"))+1),"").replaceAll("</br>","").replaceAll("<br>","").replaceAll(x.slice(x.indexOf("<font"),x.indexOf(">",x.indexOf("<font"))+1),"").replaceAll("</font>","").replaceAll(x.slice(x.indexOf("<html"),x.indexOf(">",x.indexOf("<html"))+1),"").replaceAll("</html>","").replaceAll(x.slice(x.indexOf("<body"),x.indexOf(">",x.indexOf("<body"))+1),"").replaceAll("</body>","").replaceAll(x.slice(x.indexOf("<head"),x.indexOf(">",x.indexOf("<head"))+1),"").replaceAll("</head>","").replaceAll(x.slice(x.indexOf("<meta"),x.indexOf(">",x.indexOf("<meta"))+1),"").replaceAll("</meta>","").replaceAll(x.slice(x.indexOf("<style"),x.indexOf(">",x.indexOf("<style"))+1),"").replaceAll("</style>","").replaceAll(x.slice(x.indexOf("<!"),x.indexOf(">",x.indexOf("<!"))+1),"").replaceAll("</!>","").replaceAll(x.slice(x.indexOf("<div"),x.indexOf(">",x.indexOf("<div"))+1),"").replaceAll("</div>","").replaceAll(x.slice(x.indexOf("<p"),x.indexOf(">",x.indexOf("<p"))+1),"").replaceAll(x.slice(x.indexOf("<div"),x.indexOf(">",x.indexOf("<div"))+1),"").replaceAll("</div>","").replaceAll("</p>","").replaceAll("//r","").replaceAll("<div>","").trim()
}

koryknick
Employee
Employee

@dcarlson - you could also just use a simple regular expression to strip away all the html tags. Assuming that your html string is in a single variable called “content”:

$content.replace(/<(.|\n)*?>/ig,'')

Shucks… vaidyarm had it right all this time. I had tried that with .replaceAll() and it didn’t work; but with .replace() like you both said… works. I would not have expected to use .replace() to strip ALL the tags. Live and learn.
Thank you both - Davey