cancel
Showing results for 
Search instead for 
Did you mean: 

Covert HTML to Plain Text

Drew
New Contributor

I’m using the Email Reader snap and finding that the messages that it is pulling only have an htmlBody and no textBody. I need the messages properly formatted in plane text. Does SnapLogic have an easy way to do this?

I have found and worked with several RegEx’s to strip out the HTML tags but the format that is left is not the same in most cases and can make the plane text messy.

Other options?

8 REPLIES 8

kristinajosifov
New Contributor

Hi Drew,

In the new release there is a simple way to encode or decode strings into and from HTML entities with HTML.encode()/decode() functions.
https://docs-snaplogic.atlassian.net/wiki/spaces/SD/pages/797704355/HTML+-+Encode+and+Decode+Functio...

Regards,
Kristina.

vaidyarm
Contributor

Hi

You Can try below function 🙂

replace( /(<([^>]+)>)/ig, ‘’)

Thanks

dcarlson
New Contributor

That doesn’t do the trick. What me and the originator Drew are looking for it a way to transform this…

\r\n\r\n\r\n\r\n\r\n\r\n
\r\n

\r\n
This
\r\nis
\r\na
\r\nplain
\r\ntext
\r\nbody?
\r\n

\r\n


\r\n

\r\n
\r\n\r\n\r\n

…into this…

This\r\nis\r\na\r\nplain\r\ntext\r\nbody?\r\n

In other words, we don’t need to decode… we need to strip all of the HTML tags.

Does anyone know an easy way to do this in SnapLogic? Otherwise, I the only option seems to be to build a library that knows how to do it. But someone else must have already invented that mouse trap, eh. Thanks - Davey

Ha… this website is rich text, and renders the HTML tags instead of showing them to you. 😉

Let’s try this: Convert THIS… <html>\r\n<head>\r\n<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">\r\n<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>\r\n</head>\r\n<body dir="ltr">\r\n<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">\r\n<p></p>\r\n<div>This<br>\r\nis<br>\r\na<br>\r\nplain<br>\r\ntext<br>\r\nbody?</div>\r\n<p></p>\r\n<p><br>\r\n</p>\r\n</div>\r\n</body>\r\n</html>\r\n
… to plain text.