SnapLogic - Integration Nation

Drew · ‎02-08-2019

I’m using the Email Reader snap and finding that the messages that it is pulling only have an htmlBody and no textBody. I need the messages properly formatted in plane text. Does SnapLogic have an easy way to do this?

I have found and worked with several RegEx’s to strip out the HTML tags but the format that is left is not the same in most cases and can make the plane text messy.

Other options?

dmiller · ‎09-20-2022

When writing a post in the Commuity, there is a Preformatted text option you can use with scripts to prevent it converting.

Diane Miller

dcarlson · ‎09-20-2022

Yup, that’s what I did on my 2nd attempt.

For anyone else experiencing this problem… this is the expression I wrote for our library to strip (at least) the HTML that I am seeing; might need more tags added to it.

{
stripHTML: x => x.replaceAll(x.slice(x.indexOf("<b"),x.indexOf(">",x.indexOf("<b"))+1),"").replaceAll("</b>","").replaceAll(x.slice(x.indexOf("<span"),x.indexOf(">",x.indexOf("<span"))+1),"").replaceAll("</span>","").replaceAll(x.slice(x.indexOf("<i"),x.indexOf(">",x.indexOf("<i"))+1),"").replaceAll("</i>","").replaceAll(x.slice(x.indexOf("<u"),x.indexOf(">",x.indexOf("<u"))+1),"").replaceAll("</u>","").replaceAll(x.slice(x.indexOf("<br"),x.indexOf(">",x.indexOf("<br"))+1),"").replaceAll("</br>","").replaceAll("<br>","").replaceAll(x.slice(x.indexOf("<font"),x.indexOf(">",x.indexOf("<font"))+1),"").replaceAll("</font>","").replaceAll(x.slice(x.indexOf("<html"),x.indexOf(">",x.indexOf("<html"))+1),"").replaceAll("</html>","").replaceAll(x.slice(x.indexOf("<body"),x.indexOf(">",x.indexOf("<body"))+1),"").replaceAll("</body>","").replaceAll(x.slice(x.indexOf("<head"),x.indexOf(">",x.indexOf("<head"))+1),"").replaceAll("</head>","").replaceAll(x.slice(x.indexOf("<meta"),x.indexOf(">",x.indexOf("<meta"))+1),"").replaceAll("</meta>","").replaceAll(x.slice(x.indexOf("<style"),x.indexOf(">",x.indexOf("<style"))+1),"").replaceAll("</style>","").replaceAll(x.slice(x.indexOf("<!"),x.indexOf(">",x.indexOf("<!"))+1),"").replaceAll("</!>","").replaceAll(x.slice(x.indexOf("<div"),x.indexOf(">",x.indexOf("<div"))+1),"").replaceAll("</div>","").replaceAll(x.slice(x.indexOf("<p"),x.indexOf(">",x.indexOf("<p"))+1),"").replaceAll(x.slice(x.indexOf("<div"),x.indexOf(">",x.indexOf("<div"))+1),"").replaceAll("</div>","").replaceAll("</p>","").replaceAll("//r","").replaceAll("<div>","").trim()
}

koryknick · ‎09-20-2022

@dcarlson - you could also just use a simple regular expression to strip away all the html tags. Assuming that your html string is in a single variable called “content”:

$content.replace(/<(.|\n)*?>/ig,'')

dcarlson · ‎09-20-2022

Shucks… vaidyarm had it right all this time. I had tried that with .replaceAll() and it didn’t work; but with .replace() like you both said… works. I would not have expected to use .replace() to strip ALL the tags. Live and learn.
Thank you both - Davey

SnapLogic - Integration Nation

Covert HTML to Plain Text