Forum Discussion

peterlu's avatar
peterlu
Champion
5 years ago

html entity decode

Hi,

 

We all know that freemarker can encode special html characters( &, quotes, < > and more),

eg. & => &amp;

But I need to way to decode it.

eg. &amp;  => &

I have looked around. It looks like there is no such method built-in to freemarker?

 

Peter

 

  • Hi Peter, you are correct that FreeMarker did not implement HTML processing like decoding. This was deliberate: FreeMarker's primary use case is to output HTML, so to prevent common security vulnerabilities they designed it so that you work with raw data, and escaping for HTML is the final step before outputting. We're stretching FreeMarker past its design goals, doing much more complex things like API calls and data transformation -- so we did add a utility:

    utils.html.unescaper.unescape(...).

    The usual caveats apply:

    • FreeMarker is not a programming language; it is primarily designed for displaying data and is not great at converting or building data. If the message body is being sent somewhere else, it may be a better idea to implement all the HTML post-processing in that environment. This would also make it easier to tweak the algorithm, and it might make user requests faster by offloading some work from the community.
    • Unescaping should only be done on one continuous block of HTML text, with no HTML elements present. That is, all elements must be removed.
    • Even if you strip all HTML, the unescape may result in HTML (because the original escaped text may look like HTML). Once unescaped, make sure it is never accidentally placed directly in an HTML context without proper escaping.
  • AndrewF's avatar
    AndrewF
    Khoros Oracle

    Hi Peter, you are correct that FreeMarker did not implement HTML processing like decoding. This was deliberate: FreeMarker's primary use case is to output HTML, so to prevent common security vulnerabilities they designed it so that you work with raw data, and escaping for HTML is the final step before outputting. We're stretching FreeMarker past its design goals, doing much more complex things like API calls and data transformation -- so we did add a utility:

    utils.html.unescaper.unescape(...).

    The usual caveats apply:

    • FreeMarker is not a programming language; it is primarily designed for displaying data and is not great at converting or building data. If the message body is being sent somewhere else, it may be a better idea to implement all the HTML post-processing in that environment. This would also make it easier to tweak the algorithm, and it might make user requests faster by offloading some work from the community.
    • Unescaping should only be done on one continuous block of HTML text, with no HTML elements present. That is, all elements must be removed.
    • Even if you strip all HTML, the unescape may result in HTML (because the original escaped text may look like HTML). Once unescaped, make sure it is never accidentally placed directly in an HTML context without proper escaping.
  • peterlu,

    Why you need to decode this? If you are requesting any text to endpoint you can directly "?url" to variable, so that it will pass as url encode and you can get the same using url decode or directly to any API or endpoint from get request parameter.

    • peterlu's avatar
      peterlu
      Champion

      Parshant I am not using the decode for the url parameter. After I get the rest api query result, the message body data is the html format. I am using the utils function to strip the htm tags. And then I also need to html entities decode to pass to the JSON data to send to a third party system. The third party system needs to build AI report around it. They want pure characters instead of html entities like &amp; &apos; etc. I know there is a way to use string replace to do that, but that is dirty. Java object should have a way to do this, and the utils freemarker object should have this built-in.

       

      The question is purely: Is there a way to html entities decode in freemarker object. Thanks.