02-08-2019 02:15 PM
I’m using the Email Reader snap and finding that the messages that it is pulling only have an htmlBody and no textBody. I need the messages properly formatted in plane text. Does SnapLogic have an easy way to do this?
I have found and worked with several RegEx’s to strip out the HTML tags but the format that is left is not the same in most cases and can make the plane text messy.
Other options?
09-20-2022 09:05 AM
When writing a post in the Commuity, there is a Preformatted text option you can use with scripts to prevent it converting.
09-20-2022 11:39 AM
Yup, that’s what I did on my 2nd attempt.
For anyone else experiencing this problem… this is the expression I wrote for our library to strip (at least) the HTML that I am seeing; might need more tags added to it.
{
stripHTML: x => x.replaceAll(x.slice(x.indexOf("<b"),x.indexOf(">",x.indexOf("<b"))+1),"").replaceAll("</b>","").replaceAll(x.slice(x.indexOf("<span"),x.indexOf(">",x.indexOf("<span"))+1),"").replaceAll("</span>","").replaceAll(x.slice(x.indexOf("<i"),x.indexOf(">",x.indexOf("<i"))+1),"").replaceAll("</i>","").replaceAll(x.slice(x.indexOf("<u"),x.indexOf(">",x.indexOf("<u"))+1),"").replaceAll("</u>","").replaceAll(x.slice(x.indexOf("<br"),x.indexOf(">",x.indexOf("<br"))+1),"").replaceAll("</br>","").replaceAll("<br>","").replaceAll(x.slice(x.indexOf("<font"),x.indexOf(">",x.indexOf("<font"))+1),"").replaceAll("</font>","").replaceAll(x.slice(x.indexOf("<html"),x.indexOf(">",x.indexOf("<html"))+1),"").replaceAll("</html>","").replaceAll(x.slice(x.indexOf("<body"),x.indexOf(">",x.indexOf("<body"))+1),"").replaceAll("</body>","").replaceAll(x.slice(x.indexOf("<head"),x.indexOf(">",x.indexOf("<head"))+1),"").replaceAll("</head>","").replaceAll(x.slice(x.indexOf("<meta"),x.indexOf(">",x.indexOf("<meta"))+1),"").replaceAll("</meta>","").replaceAll(x.slice(x.indexOf("<style"),x.indexOf(">",x.indexOf("<style"))+1),"").replaceAll("</style>","").replaceAll(x.slice(x.indexOf("<!"),x.indexOf(">",x.indexOf("<!"))+1),"").replaceAll("</!>","").replaceAll(x.slice(x.indexOf("<div"),x.indexOf(">",x.indexOf("<div"))+1),"").replaceAll("</div>","").replaceAll(x.slice(x.indexOf("<p"),x.indexOf(">",x.indexOf("<p"))+1),"").replaceAll(x.slice(x.indexOf("<div"),x.indexOf(">",x.indexOf("<div"))+1),"").replaceAll("</div>","").replaceAll("</p>","").replaceAll("//r","").replaceAll("<div>","").trim()
}
09-20-2022 11:56 AM
@dcarlson - you could also just use a simple regular expression to strip away all the html tags. Assuming that your html string is in a single variable called “content”:
$content.replace(/<(.|\n)*?>/ig,'')
09-20-2022 12:14 PM
Shucks… vaidyarm had it right all this time. I had tried that with .replaceAll() and it didn’t work; but with .replace() like you both said… works. I would not have expected to use .replace() to strip ALL the tags. Live and learn.
Thank you both - Davey