Decode & back to & in JavaScript

Decode & back to & in JavaScript

I have strings like
var str = ‘One & two & three’;

rendered into HTML by the web server. I need to transform those strings into
‘One & two & three’

Currently, that’s what I am doing (with help of jQuery):
$(document.createElement(‘div’)).html(‘{{ driver.person.name }}’).text()

However I have an unsettling feeling that I am doing it wrong.
I have tried
unescape(“&”)

but it doesn’t seem to work, neither do decodeURI/decodeURIComponent.
Are there any other, more native and elegant ways of doing so?

Solutions/Answers:

Solution 1:

A more modern option for interpreting HTML (text and otherwise) from JavaScript is the HTML support in the DOMParser API (see here in MDN). This allows you to use the browser’s native HTML parser to convert a string to an HTML document. It has been supported in new versions of all major browsers since late 2014.

If we just want to decode some text content, we can put it as the sole content in a document body, parse the document, and pull out the its .body.textContent.

var encodedStr = 'hello & world';

var parser = new DOMParser;
var dom = parser.parseFromString(
    '<!doctype html><body>' + encodedStr,
    'text/html');
var decodedString = dom.body.textContent;

console.log(decodedString);

We can see in the draft specification for DOMParser that JavaScript is not enabled for the parsed document, so we can perform this text conversion without security concerns.

The parseFromString(str, type) method must run these steps, depending on type:

  • "text/html"

    Parse str with an HTML parser, and return the newly created Document.

    The scripting flag must be set to “disabled”.

    NOTE

    script elements get marked unexecutable and the contents of noscript get parsed as markup.

It’s beyond the scope of this question, but please note that if you’re taking the parsed DOM nodes themselves (not just their text content) and moving them to the live document DOM, it’s possible that their scripting would be reenabled, and there could be security concerns. I haven’t researched it, so please exercise caution.

Related:  jQuery Get Selected Option From Dropdown

Solution 2:

Do you need to decode all encoded HTML entities or just &amp; itself?

If you only need to handle &amp; then you can do this:

var decoded = encoded.replace(/&amp;/g, '&');

If you need to decode all HTML entities then you can do it without jQuery:

var elem = document.createElement('textarea');
elem.innerHTML = encoded;
var decoded = elem.value;

Please take note of Mark’s comments below which highlight security holes in an earlier version of this answer and recommend using textarea rather than div to mitigate against potential XSS vulnerabilities. These vulnerabilities exist whether you use jQuery or plain JavaScript.

Solution 3:

Matthias Bynens has a library for this: https://github.com/mathiasbynens/he

Example:

console.log(
    he.decode("J&#246;rg &amp J&#xFC;rgen rocked to &amp; fro ")
);
// Logs "Jörg & Jürgen rocked to & fro"

I suggest favouring it over hacks involving setting an element’s HTML content and then reading back its text content. Such approaches can work, but are deceptively dangerous and present XSS opportunities if used on untrusted user input.

Related:  jQuery: Select data attributes that aren't empty?

If you really can’t bear to load in a library, you can use the textarea hack described in this answer to a near-duplicate question, which, unlike various similar approaches that have been suggested, has no security holes that I know of:

function decodeEntities(encodedString) {
    var textArea = document.createElement('textarea');
    textArea.innerHTML = encodedString;
    return textArea.value;
}

console.log(decodeEntities('1 &amp; 2')); // '1 & 2'

But take note of the security issues, affecting similar approaches to this one, that I list in the linked answer! This approach is a hack, and future changes to the permissible content of a textarea (or bugs in particular browsers) could lead to code that relies upon it suddenly having an XSS hole one day.

Solution 4:

var htmlEnDeCode = (function() {
    var charToEntityRegex,
        entityToCharRegex,
        charToEntity,
        entityToChar;

    function resetCharacterEntities() {
        charToEntity = {};
        entityToChar = {};
        // add the default set
        addCharacterEntities({
            '&amp;'     :   '&',
            '&gt;'      :   '>',
            '&lt;'      :   '<',
            '&quot;'    :   '"',
            '&#39;'     :   "'"
        });
    }

    function addCharacterEntities(newEntities) {
        var charKeys = [],
            entityKeys = [],
            key, echar;
        for (key in newEntities) {
            echar = newEntities[key];
            entityToChar[key] = echar;
            charToEntity[echar] = key;
            charKeys.push(echar);
            entityKeys.push(key);
        }
        charToEntityRegex = new RegExp('(' + charKeys.join('|') + ')', 'g');
        entityToCharRegex = new RegExp('(' + entityKeys.join('|') + '|&#[0-9]{1,5};' + ')', 'g');
    }

    function htmlEncode(value){
        var htmlEncodeReplaceFn = function(match, capture) {
            return charToEntity[capture];
        };

        return (!value) ? value : String(value).replace(charToEntityRegex, htmlEncodeReplaceFn);
    }

    function htmlDecode(value) {
        var htmlDecodeReplaceFn = function(match, capture) {
            return (capture in entityToChar) ? entityToChar[capture] : String.fromCharCode(parseInt(capture.substr(2), 10));
        };

        return (!value) ? value : String(value).replace(entityToCharRegex, htmlDecodeReplaceFn);
    }

    resetCharacterEntities();

    return {
        htmlEncode: htmlEncode,
        htmlDecode: htmlDecode
    };
})();

This is from ExtJS source code.

Related:  Regex for password must contain at least eight characters, at least one number and both lower and uppercase letters and special characters

Solution 5:

element.innerText also does the trick.

Solution 6:

In case you’re looking for it, like me – meanwhile there’s a nice and safe JQuery method.

https://api.jquery.com/jquery.parsehtml/

You can f.ex. type this in your console:

var x = "test &amp;";
> undefined
$.parseHTML(x)[0].textContent
> "test &"

So $.parseHTML(x) returns an array, and if you have HTML markup within your text, the array.length will be greater than 1.