Stripping HTML from the message body returned by a rest call

Question

I often want to truncate the message body returned by a REST call - sort of like making my own teaser.&nbsp; This quick little freemarker code will clean out any HTML (so I don't end up with unclosed tags) and truncate my text down to 250 characters if the message body is greater than that:
&nbsp;
&lt;#assign msg=rest("/messages/id/${message_id}").message &gt;&lt;#assign body = msg.body?replace("&lt;(.|
)*?&gt;", "", "r")&gt;&lt;#if body?string?length gt 250&gt; ${body?substring(0, 247)}... &lt;#else&gt;${body}&lt;/#if&gt;
&nbsp;

kaelac · Answer

For stripping out HTML you can use this new util:
&nbsp;
Strips HTML using the google gdata html stripper, which does the following:

Converts &lt;br&gt; and &lt;p&gt; tags to new lines
Converts &lt;li&gt; tags to new line and adds a dash
Strips remaining tags
Adds newlines after 72 characters (for readability and conformance with email standards)

&nbsp;

${utils.html.stripper.from.gdata.strip("&lt;p&gt;hello &lt;b&gt;world&lt;/b&gt;&lt;strong&gt;!&lt;/strong&gt;")
&nbsp;

adamn · Answer

Hi cblown,
&nbsp;
Freemarker has two built-ins for strings that would probably come in handy for that.
&nbsp;
index_of - http://freemarker.sourceforge.net/docs/ref_builtins_string.html#ref_builtin_index_of
last_index_of - http://freemarker.sourceforge.net/docs/ref_builtins_string.html#ref_builtin_last_index_of
&nbsp;
Both of these work kind of like a search function for a string, and they both also allow you to specify a starting index to search from.
&nbsp;
So let's say you know you want about 250 characters. You could use that as the starting point for your search to look for a space. index_of can be used to get the remainder of the word at 250 characters (so you'll likely wind up with a few more than 250). If you have a hard limit and want no more than 250 characters, you should probably use last_index_of, so that it will search backwards. Keep in mind that the string indexes actually start at 0 for the first character, so if you want no more than 250 characters, you should actually use a starting index of 249.
&nbsp;
index_of and last_index_of will both return an index of the match, or -1 if no match exists. You'll probably want to assign this index to a variable. So if the index is greater than 0, you can just use that variable in place of the hard-coded "247". If the index is less than 0 for some reason (ie. a really long string of text with no breaks), your best bet would probably be to just go ahead and use the "default" value (ie. 247 or whatever you choose). That probably won't ever happen, but your code should probably handle it just in case.
&nbsp;
I hope this helps!

cblown · Answer

Thanks for that snippet. Maybe I've missed something but how do you avoid truncating mid-word?

inactive user · Answer

Have you run into any errors with this snippet due to the string being parsed being too long? I'm getting StackOverflowError: null errors in that case.

adamn · Answer

That's odd. I don't recall seeing anything like that before. How many characters is the string you're dealing with?
&nbsp;
There's actually a newer method for truncating strings that you may want to check out to see if it works better for you:
http://lithosphere.lithium.com/t5/support-knowledge-base/Character-truncation-variables/ta-p/38324

Forum Discussion

Stripping HTML from the message body returned by a rest call

8 Replies

Recent Discussions

How to share product updates?

Handlebars custom component not executing GraphQL query

How to get attachment download metrics by message

How to get attachment download metrics by message

How to translate the email template in Khoros production