What’s the right way to decode a string that has special HTML entities in it? [duplicate]

What’s the right way to decode a string that has special HTML entities in it? [duplicate]

This question already has an answer here:

Unescape HTML entities in Javascript?

11 answers

Say I get some JSON back from a service request that looks like this:
{
“message”: “We're unable to complete your request at this time.”
}

I’m not sure why that apostraphe is encoded like that ('); all I know is that I want to decode it.
Here’s one approach using jQuery that popped into my head:
function decodeHtml(html) {
return $(‘

‘).html(html).text();
}

That seems (very) hacky, though. What’s a better way? Is there a “right” way?

Solutions/Answers:

Solution 1:

This is my favourite way of decoding HTML characters. The advantage of using this code is that tags are also preserved.

function decodeHtml(html) {
    var txt = document.createElement("textarea");
    txt.innerHTML = html;
    return txt.value;
}

Example: http://jsfiddle.net/k65s3/

Input:

Entity:&nbsp;Bad attempt at XSS:<script>alert('new\nline?')</script><br>

Output:

Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>

Solution 2:

Don’t use the DOM to do this. Using the DOM to decode HTML entities (as suggested in the currently accepted answer) leads to differences in cross-browser results.

Related:  Click Entire Row (preserving middle click and ctrl+click)

For a robust & deterministic solution that decodes character references according to the algorithm in the HTML Standard, use the he library. From its README:

he (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersands and other edge cases just like a browser would, has an extensive test suite, and — contrary to many other JavaScript solutions — he handles astral Unicode symbols just fine. An online demo is available.

Here’s how you’d use it:

he.decode("We&#39;re unable to complete your request at this time.");
→ "We're unable to complete your request at this time."

Disclaimer: I’m the author of the he library.

See this Stack Overflow answer for some more info.

Solution 3:

If you don’t want to use html/dom, you could use regex. I haven’t tested this; but something along the lines of:

function parseHtmlEntities(str) {
    return str.replace(/&#([0-9]{1,3});/gi, function(match, numStr) {
        var num = parseInt(numStr, 10); // read num as normal number
        return String.fromCharCode(num);
    });
}

[Edit]

Note: this would only work for numeric html-entities, and not stuff like &oring.

Related:  Detect fullscreen mode

[Edit 2]

Fixed the function (some typos), test here: http://jsfiddle.net/Be2Bd/1/

Solution 4:

jQuery will encode and decode for you.

function htmlDecode(value) {
  return $("<textarea/>").html(value).text();
}

function htmlEncode(value) {
  return $('<textarea/>').text(value).html();
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<script>
$(document).ready(function() {
   $("#encoded")
  .text(htmlEncode("<img src onerror='alert(0)'>"));
   $("#decoded")
  .text(htmlDecode("&lt;img src onerror='alert(0)'&gt;"));
});
</script>

<span>htmlEncode() result:</span><br/>
<div id="encoded"></div>
<br/>
<span>htmlDecode() result:</span><br/>
<div id="decoded"></div>

Solution 5:

There’s JS function to deal with &#xxxx styled entities:
function at GitHub

// encode(decode) html text into html entity
var decodeHtmlEntity = function(str) {
  return str.replace(/&#(\d+);/g, function(match, dec) {
    return String.fromCharCode(dec);
  });
};

var encodeHtmlEntity = function(str) {
  var buf = [];
  for (var i=str.length-1;i>=0;i--) {
    buf.unshift(['&#', str[i].charCodeAt(), ';'].join(''));
  }
  return buf.join('');
};

var entity = '&#39640;&#32423;&#31243;&#24207;&#35774;&#35745;';
var str = '高级程序设计';
console.log(decodeHtmlEntity(entity) === str);
console.log(encodeHtmlEntity(str) === entity);
// output:
// true
// true

Solution 6:

_.unescape does what you’re looking for

https://lodash.com/docs/#unescape