โ02-08-2019 02:15 PM
Iโm using the Email Reader snap and finding that the messages that it is pulling only have an htmlBody and no textBody. I need the messages properly formatted in plane text. Does SnapLogic have an easy way to do this?
I have found and worked with several RegExโs to strip out the HTML tags but the format that is left is not the same in most cases and can make the plane text messy.
Other options?
โ09-20-2022 09:05 AM
When writing a post in the Commuity, there is a Preformatted text option you can use with scripts to prevent it converting.
โ09-20-2022 11:39 AM
Yup, thatโs what I did on my 2nd attempt.
For anyone else experiencing this problemโฆ this is the expression I wrote for our library to strip (at least) the HTML that I am seeing; might need more tags added to it.
{
stripHTML: x => x.replaceAll(x.slice(x.indexOf("<b"),x.indexOf(">",x.indexOf("<b"))+1),"").replaceAll("</b>","").replaceAll(x.slice(x.indexOf("<span"),x.indexOf(">",x.indexOf("<span"))+1),"").replaceAll("</span>","").replaceAll(x.slice(x.indexOf("<i"),x.indexOf(">",x.indexOf("<i"))+1),"").replaceAll("</i>","").replaceAll(x.slice(x.indexOf("<u"),x.indexOf(">",x.indexOf("<u"))+1),"").replaceAll("</u>","").replaceAll(x.slice(x.indexOf("<br"),x.indexOf(">",x.indexOf("<br"))+1),"").replaceAll("</br>","").replaceAll("<br>","").replaceAll(x.slice(x.indexOf("<font"),x.indexOf(">",x.indexOf("<font"))+1),"").replaceAll("</font>","").replaceAll(x.slice(x.indexOf("<html"),x.indexOf(">",x.indexOf("<html"))+1),"").replaceAll("</html>","").replaceAll(x.slice(x.indexOf("<body"),x.indexOf(">",x.indexOf("<body"))+1),"").replaceAll("</body>","").replaceAll(x.slice(x.indexOf("<head"),x.indexOf(">",x.indexOf("<head"))+1),"").replaceAll("</head>","").replaceAll(x.slice(x.indexOf("<meta"),x.indexOf(">",x.indexOf("<meta"))+1),"").replaceAll("</meta>","").replaceAll(x.slice(x.indexOf("<style"),x.indexOf(">",x.indexOf("<style"))+1),"").replaceAll("</style>","").replaceAll(x.slice(x.indexOf("<!"),x.indexOf(">",x.indexOf("<!"))+1),"").replaceAll("</!>","").replaceAll(x.slice(x.indexOf("<div"),x.indexOf(">",x.indexOf("<div"))+1),"").replaceAll("</div>","").replaceAll(x.slice(x.indexOf("<p"),x.indexOf(">",x.indexOf("<p"))+1),"").replaceAll(x.slice(x.indexOf("<div"),x.indexOf(">",x.indexOf("<div"))+1),"").replaceAll("</div>","").replaceAll("</p>","").replaceAll("//r","").replaceAll("<div>","").trim()
}
โ09-20-2022 11:56 AM
@dcarlson - you could also just use a simple regular expression to strip away all the html tags. Assuming that your html string is in a single variable called โcontentโ:
$content.replace(/<(.|\n)*?>/ig,'')
โ09-20-2022 12:14 PM
Shucksโฆ vaidyarm had it right all this time. I had tried that with .replaceAll() and it didnโt work; but with .replace() like you both saidโฆ works. I would not have expected to use .replace() to strip ALL the tags. Live and learn.
Thank you both - Davey