Escaping HTML strings with jQuery

Escaping HTML strings with jQuery

Does anyone know of an easy way to escape HTML from strings in jQuery? I need to be able to pass an arbitrary string and have it properly escaped for display in an HTML page (preventing JavaScript/HTML injection attacks). I’m sure it’s possible to extend jQuery to do this, but I don’t know enough about the framework at the moment to accomplish this.

Solutions/Answers:

Solution 1:

Since you’re using jQuery, you can just set the element’s text property:

// before:
// <div class="someClass">text</div>
var someHtmlString = "<script>alert('hi!');</script>";

// set a DIV's text:
$("div.someClass").text(someHtmlString);
// after: 
// <div class="someClass">&lt;script&gt;alert('hi!');&lt;/script&gt;</div>

// get the text in a string:
var escaped = $("<div>").text(someHtmlString).html();
// value: 
// &lt;script&gt;alert('hi!');&lt;/script&gt;

Solution 2:

There is also the solution from mustache.js

var entityMap = {
  '&': '&amp;',
  '<': '&lt;',
  '>': '&gt;',
  '"': '&quot;',
  "'": '&#39;',
  '/': '&#x2F;',
  '`': '&#x60;',
  '=': '&#x3D;'
};

function escapeHtml (string) {
  return String(string).replace(/[&<>"'`=\/]/g, function (s) {
    return entityMap[s];
  });
}

Solution 3:

$('<div/>').text('This is fun & stuff').html(); // "This is fun &amp; stuff"

Source: http://debuggable.com/posts/encode-html-entities-with-jquery:480f4dd6-13cc-4ce9-8071-4710cbdd56cb

Solution 4:

If you’re escaping for HTML, there are only three that I can think of that would be really necessary:

html.replace(/&/g, "&amp;").replace(/</g, "&lt;").replace(/>/g, "&gt;");

Depending on your use case, you might also need to do things like " to &quot;. If the list got big enough, I’d just use an array:

var escaped = html;
var findReplace = [[/&/g, "&amp;"], [/</g, "&lt;"], [/>/g, "&gt;"], [/"/g, "&quot;"]]
for(var item in findReplace)
    escaped = escaped.replace(findReplace[item][0], findReplace[item][1]);

encodeURIComponent() will only escape it for URLs, not for HTML.

Solution 5:

I wrote a tiny little function which does this. It only escapes ", &, < and > (but usually that’s all you need anyway). It is slightly more elegant then the earlier proposed solutions in that it only uses one .replace() to do all the conversion. (EDIT 2: Reduced code complexity making the function even smaller and neater, if you’re curious about the original code see end of this answer.)

function escapeHtml(text) {
    'use strict';
    return text.replace(/[\"&<>]/g, function (a) {
        return { '"': '&quot;', '&': '&amp;', '<': '&lt;', '>': '&gt;' }[a];
    });
}

This is plain Javascript, no jQuery used.

Escaping / and ' too

Edit in response to mklement‘s comment.

The above function can easily be expanded to include any character. To specify more characters to escape, simply insert them both in the character class in the regular expression (i.e. inside the /[...]/g) and as an entry in the chr object. (EDIT 2: Shortened this function too, in the same way.)

function escapeHtml(text) {
    'use strict';
    return text.replace(/[\"&'\/<>]/g, function (a) {
        return {
            '"': '&quot;', '&': '&amp;', "'": '&#39;',
            '/': '&#47;',  '<': '&lt;',  '>': '&gt;'
        }[a];
    });
}

Note the above use of &#39; for apostrophe (the symbolic entity &apos; might have been used instead – it is defined in XML, but was originally not included in the HTML spec and might therefore not be supported by all browsers. See: Wikipedia article on HTML character encodings). I also recall reading somewhere that using decimal entities is more widely supported than using hexadecimal, but I can’t seem to find the source for that now though. (And there cannot be many browsers out there which does not support the hexadecimal entities.)

Note: Adding / and ' to the list of escaped characters isn’t all that useful, since they do not have any special meaning in HTML and do not need to be escaped.

Original escapeHtml Function

EDIT 2: The original function used a variable (chr) to store the object needed for the .replace() callback. This variable also needed an extra anonymous function to scope it, making the function (needlessly) a little bit bigger and more complex.

var escapeHtml = (function () {
    'use strict';
    var chr = { '"': '&quot;', '&': '&amp;', '<': '&lt;', '>': '&gt;' };
    return function (text) {
        return text.replace(/[\"&<>]/g, function (a) { return chr[a]; });
    };
}());

I haven’t tested which of the two versions are faster. If you do, feel free to add info and links about it here.

Solution 6:

Easy enough to use underscore:

_.escape(string) 

Underscore is a utility library that provides a lot of features that native js doesn’t provide. There’s also lodash which is the same API as underscore but was rewritten to be more performant.