HTML Entity Decode [duplicate]

HTML Entity Decode [duplicate]

This question already has an answer here:

Unescape HTML entities in Javascript?

11 answers

How do I encode and decode HTML entities using JavaScript or JQuery?
var varTitle = “Chris' corner”;

I want it to be:
var varTitle = “Chris’ corner”;

Solutions/Answers:

Solution 1:

You could try something like:

var Title = $('<textarea />').html("Chris&apos; corner").text();
console.log(Title);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

JS Fiddle.

A more interactive version:

$('form').submit(function() {
  var theString = $('#string').val();
  var varTitle = $('<textarea />').html(theString).text();
  $('#output').text(varTitle);
  return false;
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<form action="#" method="post">
  <fieldset>
    <label for="string">Enter a html-encoded string to decode</label>
    <input type="text" name="string" id="string" />
  </fieldset>
  <fieldset>
    <input type="submit" value="decode" />
  </fieldset>
</form>

<div id="output"></div>

JS Fiddle.

Solution 2:

I recommend against using the jQuery code that was accepted as the answer. While it does not insert the string to decode into the page, it does cause things such as scripts and HTML elements to get created. This is way more code than we need. Instead, I suggest using a safer, more optimized function.

var decodeEntities = (function() {
  // this prevents any overhead from creating the object each time
  var element = document.createElement('div');

  function decodeHTMLEntities (str) {
    if(str && typeof str === 'string') {
      // strip script/html tags
      str = str.replace(/<script[^>]*>([\S\s]*?)<\/script>/gmi, '');
      str = str.replace(/<\/?\w(?:[^"'>]|"[^"]*"|'[^']*')*>/gmi, '');
      element.innerHTML = str;
      str = element.textContent;
      element.textContent = '';
    }

    return str;
  }

  return decodeHTMLEntities;
})();

http://jsfiddle.net/LYteC/4/

Related:  Javascript: “Infinite” parameters for function?

To use this function, just call decodeEntities("&amp;") and it will use the same underlying techniques as the jQuery version will—but without jQuery’s overhead, and after sanitizing the HTML tags in the input. See Mike Samuel’s comment on the accepted answer for how to filter out HTML tags.

This function can be easily used as a jQuery plugin by adding the following line in your project.

jQuery.decodeEntities = decodeEntities;

Solution 3:

Like Robert K said, don’t use jQuery.html().text() to decode html entities as it’s unsafe because user input should never have access to the DOM. Read about XSS for why this is unsafe.

Instead try the Underscore.js utility-belt library which comes with escape and unescape methods:

_.escape(string)

Escapes a string for insertion into HTML, replacing &, <, >, ", `, and ' characters.

_.escape('Curly, Larry & Moe');
=> "Curly, Larry &amp; Moe"

_.unescape(string)

The opposite of escape, replaces &amp;, &lt;, &gt;, &quot;, &#96; and &#x27; with their unescaped counterparts.

_.unescape('Curly, Larry &amp; Moe');
=> "Curly, Larry & Moe"

To support decoding more characters, just copy the Underscore unescape method and add more characters to the map.

Related:  Remove querystring from URL

Solution 4:

Here’s a quick method that doesn’t require creating a div, and decodes the “most common” HTML escaped chars:

function decodeHTMLEntities(text) {
    var entities = [
        ['amp', '&'],
        ['apos', '\''],
        ['#x27', '\''],
        ['#x2F', '/'],
        ['#39', '\''],
        ['#47', '/'],
        ['lt', '<'],
        ['gt', '>'],
        ['nbsp', ' '],
        ['quot', '"']
    ];

    for (var i = 0, max = entities.length; i < max; ++i) 
        text = text.replace(new RegExp('&'+entities[i][0]+';', 'g'), entities[i][1]);

    return text;
}

Solution 5:

Inspired by Robert K’s solution, this version does not strip HTML tags, and is just as secure.

var decode_entities = (function() {
    // Remove HTML Entities
    var element = document.createElement('div');

    function decode_HTML_entities (str) {

        if(str && typeof str === 'string') {

            // Escape HTML before decoding for HTML Entities
            str = escape(str).replace(/%26/g,'&').replace(/%23/g,'#').replace(/%3B/g,';');

            element.innerHTML = str;
            if(element.innerText){
                str = element.innerText;
                element.innerText = '';
            }else{
                // Firefox support
                str = element.textContent;
                element.textContent = '';
            }
        }
        return unescape(str);
    }
    return decode_HTML_entities;
})();

Solution 6:

This is my favourite way of decoding HTML characters. The advantage of using this code is that tags are also preserved.

function decodeHtml(html) {
    var txt = document.createElement("textarea");
    txt.innerHTML = html;
    return txt.value;
}

Example: http://jsfiddle.net/k65s3/

Related:  Indirect eval call in strict mode

Input:

Entity:&nbsp;Bad attempt at XSS:<script>alert('new\nline?')</script><br>

Output:

Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>