Converting between strings and ArrayBuffers

Converting between strings and ArrayBuffers

Is there a commonly accepted technique for efficiently converting JavaScript strings to ArrayBuffers and vice-versa? Specifically, I’d like to be able to write the contents of an ArrayBuffer to localStorage and to read it back.

Solutions/Answers:

Solution 1:

Update 2016 – five years on there are now new methods in the specs (see support below) to convert between strings and typed arrays using proper encoding.

TextEncoder

The TextEncoder represents:

The TextEncoder interface represents an encoder for a specific method,
that is a specific character encoding, like utf-8, iso-8859-2, koi8,
cp1261, gbk, …
An encoder takes a stream of code points as input and
emits a stream of bytes.

Change note since the above was written: (ibid.)

Note: Firefox, Chrome and Opera used to have support for encoding
types other than utf-8 (such as utf-16, iso-8859-2, koi8, cp1261, and
gbk). As of Firefox 48 […], Chrome 54 […] and Opera 41, no
other encoding types are available other than utf-8, in order to match
the spec.*

*) Updated specs (W3) and here (whatwg).

After creating an instance of the TextEncoder it will take a string and encode it using a given encoding parameter:

if (!("TextEncoder" in window)) 
  alert("Sorry, this browser does not support TextEncoder...");

var enc = new TextEncoder(); // always utf-8
console.log(enc.encode("This is a string converted to a Uint8Array"));

You then of course use the .buffer parameter on the resulting Uint8Array to convert the underlaying ArrayBuffer to a different view if needed.

Just make sure that the characters in the string adhere to the encoding schema, for example, if you use characters outside the UTF-8 range in the example they will be encoded to two bytes instead of one.

For general use you would use UTF-16 encoding for things like localStorage.

TextDecoder

Likewise, the opposite process uses the TextDecoder:

The TextDecoder interface represents a decoder for a specific method,
that is a specific character encoding, like utf-8, iso-8859-2, koi8,
cp1261, gbk, … A decoder takes a stream of bytes as input and emits
a stream of code points.

All available decoding types can be found here.

if (!("TextDecoder" in window))
  alert("Sorry, this browser does not support TextDecoder...");

var enc = new TextDecoder("utf-8");
var arr = new Uint8Array([84,104,105,115,32,105,115,32,97,32,85,105,110,116,
                          56,65,114,114,97,121,32,99,111,110,118,101,114,116,
                          101,100,32,116,111,32,97,32,115,116,114,105,110,103]);
console.log(enc.decode(arr));

The MDN StringView library

An alternative to these is to use the StringView library (licensed as lgpl-3.0) which goal is:

  • to create a C-like interface for strings (i.e., an array of character codes — an ArrayBufferView in JavaScript) based upon the
    JavaScript ArrayBuffer interface
  • to create a highly extensible library that anyone can extend by adding methods to the object StringView.prototype
  • to create a collection of methods for such string-like objects (since now: stringViews) which work strictly on arrays of numbers
    rather than on creating new immutable JavaScript strings
  • to work with Unicode encodings other than JavaScript’s default UTF-16 DOMStrings

giving much more flexibility. However, it would require us to link to or embed this library while TextEncoder/TextDecoder is being built-in in modern browsers.

Related:  Can I pass parameters in computed properties in Vue.Js

Support

As of July/2018:

TextEncoder (Experimental, On Standard Track)

 Chrome    | Edge      | Firefox   | IE        | Opera     | Safari
 ----------|-----------|-----------|-----------|-----------|-----------
     38    |     ?     |    19°    |     -     |     25    |     -

 Chrome/A  | Edge/mob  | Firefox/A | Opera/A   |Safari/iOS | Webview/A
 ----------|-----------|-----------|-----------|-----------|-----------
     38    |     ?     |    19°    |     ?     |     -     |     38

°) 18: Firefox 18 implemented an earlier and slightly different version
of the specification.

WEB WORKER SUPPORT:

Experimental, On Standard Track

 Chrome    | Edge      | Firefox   | IE        | Opera     | Safari
 ----------|-----------|-----------|-----------|-----------|-----------
     38    |     ?     |     20    |     -     |     25    |     -

 Chrome/A  | Edge/mob  | Firefox/A | Opera/A   |Safari/iOS | Webview/A
 ----------|-----------|-----------|-----------|-----------|-----------
     38    |     ?     |     20    |     ?     |     -     |     38

Data from MDN - `npm i -g mdncomp` by epistemex

Solution 2:

Although Dennis and gengkev solutions of using Blob/FileReader work, I wouldn’t suggest taking that approach. It is an async approach to a simple problem, and it is much slower than a direct solution. I’ve made a post in html5rocks with a simpler and (much faster) solution:
http://updates.html5rocks.com/2012/06/How-to-convert-ArrayBuffer-to-and-from-String

And the solution is:

function ab2str(buf) {
  return String.fromCharCode.apply(null, new Uint16Array(buf));
}

function str2ab(str) {
  var buf = new ArrayBuffer(str.length*2); // 2 bytes for each char
  var bufView = new Uint16Array(buf);
  for (var i=0, strLen=str.length; i<strLen; i++) {
    bufView[i] = str.charCodeAt(i);
  }
  return buf;
}

EDIT:

The Encoding API helps solving the string conversion problem. Check out the response from Jeff Posnik on Html5Rocks.com to the above original article.

Excerpt:

The Encoding API makes it simple to translate between raw bytes and native JavaScript strings, regardless of which of the many standard encodings you need to work with.

<pre id="results"></pre>

<script>
  if ('TextDecoder' in window) {
    // The local files to be fetched, mapped to the encoding that they're using.
    var filesToEncoding = {
      'utf8.bin': 'utf-8',
      'utf16le.bin': 'utf-16le',
      'macintosh.bin': 'macintosh'
    };

    Object.keys(filesToEncoding).forEach(function(file) {
      fetchAndDecode(file, filesToEncoding[file]);
    });
  } else {
    document.querySelector('#results').textContent = 'Your browser does not support the Encoding API.'
  }

  // Use XHR to fetch `file` and interpret its contents as being encoded with `encoding`.
  function fetchAndDecode(file, encoding) {
    var xhr = new XMLHttpRequest();
    xhr.open('GET', file);
    // Using 'arraybuffer' as the responseType ensures that the raw data is returned,
    // rather than letting XMLHttpRequest decode the data first.
    xhr.responseType = 'arraybuffer';
    xhr.onload = function() {
      if (this.status == 200) {
        // The decode() method takes a DataView as a parameter, which is a wrapper on top of the ArrayBuffer.
        var dataView = new DataView(this.response);
        // The TextDecoder interface is documented at http://encoding.spec.whatwg.org/#interface-textdecoder
        var decoder = new TextDecoder(encoding);
        var decodedString = decoder.decode(dataView);
        // Add the decoded file's text to the <pre> element on the page.
        document.querySelector('#results').textContent += decodedString + '\n';
      } else {
        console.error('Error while requesting', file, this);
      }
    };
    xhr.send();
  }
</script>

Solution 3:

You can use TextEncoder and TextDecoder from the Encoding standard, which is polyfilled by the stringencoding library, to convert string to and from ArrayBuffers:

var uint8array = new TextEncoder().encode(string);
var string = new TextDecoder(encoding).decode(uint8array);

Solution 4:

Blob is much slower than String.fromCharCode(null,array);

but that fails if the array buffer gets too big. The best solution I have found is to use String.fromCharCode(null,array); and split it up into operations that won’t blow the stack, but are faster than a single char at a time.

Related:  I can't understand how this javascript function works

The best solution for large array buffer is:

function arrayBufferToString(buffer){

    var bufView = new Uint16Array(buffer);
    var length = bufView.length;
    var result = '';
    var addition = Math.pow(2,16)-1;

    for(var i = 0;i<length;i+=addition){

        if(i + addition > length){
            addition = length - i;
        }
        result += String.fromCharCode.apply(null, bufView.subarray(i,i+addition));
    }

    return result;

}

I found this to be about 20 times faster than using blob. It also works for large strings of over 100mb.

Solution 5:

Based on the answer of gengkev, I created functions for both ways, because BlobBuilder can handle String and ArrayBuffer:

function string2ArrayBuffer(string, callback) {
    var bb = new BlobBuilder();
    bb.append(string);
    var f = new FileReader();
    f.onload = function(e) {
        callback(e.target.result);
    }
    f.readAsArrayBuffer(bb.getBlob());
}

and

function arrayBuffer2String(buf, callback) {
    var bb = new BlobBuilder();
    bb.append(buf);
    var f = new FileReader();
    f.onload = function(e) {
        callback(e.target.result)
    }
    f.readAsText(bb.getBlob());
}

A simple test:

string2ArrayBuffer("abc",
    function (buf) {
        var uInt8 = new Uint8Array(buf);
        console.log(uInt8); // Returns `Uint8Array { 0=97, 1=98, 2=99}`

        arrayBuffer2String(buf, 
            function (string) {
                console.log(string); // returns "abc"
            }
        )
    }
)

Solution 6:

All the following is about getting binary strings from array buffers

I’d recommend not to use

var binaryString = String.fromCharCode.apply(null, new Uint8Array(arrayBuffer));

because it

  1. crashes on big buffers (somebody wrote about “magic” size of 246300 but I got Maximum call stack size exceeded error on 120000 bytes buffer (Chrome 29))
  2. it has really poor performance (see below)
Related:  How do I test my browser timezone dependent application againts different timezones? [duplicate]

If you exactly need synchronous solution use something like

var
  binaryString = '',
  bytes = new Uint8Array(arrayBuffer),
  length = bytes.length;
for (var i = 0; i < length; i++) {
  binaryString += String.fromCharCode(bytes[i]);
}

it is as slow as the previous one but works correctly. It seems that at the moment of writing this there is no quite fast synchronous solution for that problem (all libraries mentioned in this topic uses the same approach for their synchronous features).

But what I really recommend is using Blob + FileReader approach

function readBinaryStringFromArrayBuffer (arrayBuffer, onSuccess, onFail) {
  var reader = new FileReader();
  reader.onload = function (event) {
    onSuccess(event.target.result);
  };
  reader.onerror = function (event) {
    onFail(event.target.error);
  };
  reader.readAsBinaryString(new Blob([ arrayBuffer ],
    { type: 'application/octet-stream' }));
}

the only disadvantage (not for all) is that it is asynchronous. And it is about 8-10 times faster then previous solutions! (Some details: synchronous solution on my environment took 950-1050 ms for 2.4Mb buffer but solution with FileReader had times about 100-120 ms for the same amount of data. And I have tested both synchronous solutions on 100Kb buffer and they have taken almost the same time, so loop is not much slower the using ‘apply’.)

BTW here: How to convert ArrayBuffer to and from String author compares two approaches like me and get completely opposite results (his test code is here) Why so different results? Probably because of his test string that is 1Kb long (he called it “veryLongStr”). My buffer was a really big JPEG image of size 2.4Mb.