Forum Discussion

KaelaC's avatar
KaelaC
Lithium Alumni (Retired)
14 years ago

Stripping HTML from the message body returned by a rest call

I often want to truncate the message body returned by a REST call - sort of like making my own teaser.  This quick little freemarker code will clean out any HTML (so I don't end up with unclosed tags) and truncate my text down to 250 characters if the message body is greater than that:

 

<#assign msg=rest("/messages/id/${message_id}").message >
<#assign body = msg.body?replace("<(.|\n)*?>", "", "r")>

<#if body?string?length gt 250> ${body?substring(0, 247)}... <#else>${body}</#if>

 

8 Replies

  • Thanks for that snippet. Maybe I've missed something but how do you avoid truncating mid-word?

  • AdamN's avatar
    AdamN
    Khoros Oracle
    14 years ago

    Hi cblown,

     

    Freemarker has two built-ins for strings that would probably come in handy for that.

     

    index_of - http://freemarker.sourceforge.net/docs/ref_builtins_string.html#ref_builtin_index_of

    last_index_of - http://freemarker.sourceforge.net/docs/ref_builtins_string.html#ref_builtin_last_index_of

     

    Both of these work kind of like a search function for a string, and they both also allow you to specify a starting index to search from.

     

    So let's say you know you want about 250 characters. You could use that as the starting point for your search to look for a space. index_of can be used to get the remainder of the word at 250 characters (so you'll likely wind up with a few more than 250). If you have a hard limit and want no more than 250 characters, you should probably use last_index_of, so that it will search backwards. Keep in mind that the string indexes actually start at 0 for the first character, so if you want no more than 250 characters, you should actually use a starting index of 249.

     

    index_of and last_index_of will both return an index of the match, or -1 if no match exists. You'll probably want to assign this index to a variable. So if the index is greater than 0, you can just use that variable in place of the hard-coded "247". If the index is less than 0 for some reason (ie. a really long string of text with no breaks), your best bet would probably be to just go ahead and use the "default" value (ie. 247 or whatever you choose). That probably won't ever happen, but your code should probably handle it just in case.

     

    I hope this helps!

  • Inactive User's avatar
    Inactive User
    12 years ago

    Have you run into any errors with this snippet due to the string being parsed being too long? I'm getting StackOverflowError: null errors in that case.

  • Inactive User's avatar
    Inactive User
    12 years ago

    Nice. That's going to be helpful. Does it strip out html though?

  • KaelaC's avatar
    KaelaC
    Lithium Alumni (Retired)
    12 years ago

    For stripping out HTML you can use this new util:

     

    Strips HTML using the google gdata html stripper, which does the following:

    1. Converts <br> and <p> tags to new lines
    2. Converts <li> tags to new line and adds a dash
    3. Strips remaining tags
    4. Adds newlines after 72 characters (for readability and conformance with email standards)

     

    ${utils.html.stripper.from.gdata.strip("<p>hello <b>world</b><strong>!</strong>")

     

  • Here is a problem:

    What if I want to limit the subject title to 50 characters, and in the subject title, there happen to have html entities like &lt; &amp; &trade; &quot;

    When you truncate the string, you may accidentally cut html entities in half, eg. "&am"

     

    What is the freemarker function to use to convert html entities into normal text? From the rest api, they all return subject title as html, eg: "2 > 1" is actually "2 &gt; 1" in the rest api response.

     

    ${utils.html.stripper.from.gdata.strip("<p>hello <b>world</b><strong>!</strong>")}

    The above freemarker snippet works.

    Do we have any docs of the full utils object?

  • peterlu's avatar
    peterlu
    Champion
    12 years ago

    I found a work-around. eg. subject title char limit = 50.

    After truncation, I can do another check by using string?last_index_of to locate the &

    If & is too close to the end of the truncated string, then do another substring to remove anything from & to the end.