fr/en
Performance of per-pixel image access for the JavaScript canvas

Update: Like Marc Weber pointed out to me you don't need to "attache the array back" to the image since you're only manipulating references to the array!

Last week I spent some time optimizing a small canvas demo that I've done with some good results (~40% speedup in Google Chrome). During those hacking session I came up with a general hack that can lead to a big increase in performance. So I thought that it was worth looking at it in details.

The problem

HTML 5 canvas element is a formidable tool and for a lot of things it's really fast (just look at chrome experiments or canvas demo for some nice example). However there is one domain where the speed you can achieve is less than satisfactory: per-pixel writing. Let say you want to fill the canvas surface with pixels generated by some calculation, you will have to write every pixel of the canvas with some new values. The resulting code would look like that

canvas = document.getElementById("canvas"); context = canvas.getContext("2d"); image = context.getImageData(0, 0, SCREEN_WIDTH, SCREEN_HEIGHT); var pixels = SCREEN_WIDTH*SCREEN_HEIGHT; while(--pixels){ image.data[4*pixels+0] = r; // Red value image.data[4*pixels+1] = g; // Green value image.data[4*pixels+2] = b; // Blue value image.data[4*pixels+3] = a; // Alpha value } context.putImageData(image, 0, 0);

For each pixel of the image we have to write 4 values (one for each color and the alpha channel) so if this operations turn out to be slow, this loop will be very inefficient.

The Hack

After listening to a YUI talk about javascript performance insisting on how DOM access was slow and should be avoided I wondered if every write to an element of the data array was a DOM access. This would explain the slowness of it. So I came up with this idea: Why not "detaching" the image's data array to manipulate it and copy it back before rendering. It turned out that it provided a nice speed-up in chrome, so much in fact that I made a simple benchmark to isolate the effect from the big mess of the application I was optimizing.


canvas = document.getElementById("canvas"); context = canvas.getContext("2d"); image = context.getImageData(0, 0, SCREEN_WIDTH, SCREEN_HEIGHT); var pixels = SCREEN_WIDTH*SCREEN_HEIGHT; var imageData = image.data; // here we detach the pixels array from DOM while(--pixels){ imageData[4*pixels+0] = r; // Red value imageData[4*pixels+1] = g; // Green value imageData[4*pixels+2] = b; // Blue value imageData[4*pixels+3] = a; // Alpha value } image.data = imageData; // And here we attache it back (not needed cf. update) context.putImageData(image, 0, 0);

The advantage of this hack is that it is very easy to apply in any situation of this kind.

The Benchmark

I added to this benchmark a test for something else that I noticed earlier on the development of the same application: going from a globally scoped to a more "Crockfordy" namespaced one may sometime have negative effect on performance. So the benchmark tests for a combination of the following: global vs namespaced methods and standard vs "detached" access to the image's data. To run the benchmark just go there and wait for the four tests to end. They are run one after the other, each in its own frame.

Results

There is more information here as is it seems at first sight. The first thing is that caching the image's data array is always a good idea (red values are always under blues values and green always under oranges values). I can only speculate regarding to why that is... it probably has to do with the cost associated with "crossing the DOM bridge" as describe in the talk I mentioned earlier.

You may wonder "What if instead of just writing the value of each pixel I need to read them first?". Well I did too because intuitively it would seams that this should be an even better candidate for this kind of optimisation. It turned out that this give the exact same result as when you're just writing the value. This would seams to indicate that accessing a value of the data array don't require a "flush" of the pending modification on the page (which make sense since the change to those data are not dirrectly reflected on the page).

The second interesting point is: using namespace has a big impact on performance. This as probably to do with the way modern compilers use tracing to speed things up. It seams that namespacing can make it harder for the compiler to find a valid trace through the code. Indeed, if you look at the second result for Firefox without JIT (tracing disabled) you can see that globally scope function loose their advantage.

There may well be a way to use namespace that doesn't break the tracing but I didn't find any.

Final Word

This is just a simple benchmark. Testing javascript is not a simple task and many factors can come into play. Furthermore those results are of no use comparing performances between browsers. This small article is there to start a discussion and I am open to any comments. Post your results if you use a browser not mentioned here!

As always the code is under a MIT License, so don't hesitate to roll you own version of this benchmark!

tags >> code, javascript permalink >>comments >> add

name:
e-mail or homepage:
comment:
StoyanInteresting findings, thanks for sharing! Happy my presentation helped :)

I wonder if you have any observations over something else I've been wondering. When you have a largish image which one will be faster:
- reading the whole image in a giant array
- reading the image line by line, I mean calling getImageData() once for each row
in other words - more image (DOM, apparently) access vs more memory required to keep the big array
SelimHi Stoyan,

Thank you for your presentation it was very eye opening!

I didn't try but if your goal is to do something with the whole image (like a filter) then you're probably better-off doing only one call to getImageData(). But if your image gets really big there would probably be a sweet spot in between the two solutions you proposed, like reading the image in three blocks for example. (And then you could pass the image data array to some webWorkers to do the job in parallel :D )
StoyanThanks Selim!
Simon CharetteSomething is wrong with your iterator variable. Mispelled 'i' or 'pixels' ?
SamI realize this may be a fairly "noob" comment -- but how do / what tools are involved in producing your performance stats?

I appreciate the help :).
SamAha! Answered my own question -- you used a benchmark application and then ran it yourself.
Selim@Simon woops, corrected it... Thank you!

@Sam Yes I coded a simple benchmark that loads the tests in an iframe and start them one after the other. Look at the source files for more information.
AshYou actually do not do "detach and copy back" as imageData is a reference, you just access the data array directly, not through image DOM element. Isn't "direct access" more appropriate than "detached access"? Interesting findings and speedup anyway!
PetrAnd what about this:

[code]
var SCREEN_HEIGHT = 240;
var SCREEN_WIDTH = 320;

var canvas, context, image;
var grey = 0;
var accumulated = 0;

var loop = function()
{
if (grey < 256) {
grey += 1;
} else {
window.parent.done(accumulated/grey);
return;
}

var a = (new Date()).getTime();

var ptr = -1;
var end = ptr + SCREEN_WIDTH * SCREEN_HEIGHT * 4;
var imageData = image.data;

do {
imageData[++ptr] = grey;
imageData[++ptr] = grey;
imageData[++ptr] = grey;
imageData[++ptr] = 255;
} while (ptr != end);

image.data = imageData;
context.putImageData(image, 0, 0);
var b = (new Date()).getTime();

accumulated += (b-a);
document.getElementById("diff").innerHTML = "mean duration: " + Math.round(accumulated/grey) + "ms";

setTimeout(loop, 10)
}

var start = function()
{
// retrieve the canvas
canvas = document.getElementById("canvas");
canvas.height = SCREEN_HEIGHT;
canvas.width = SCREEN_WIDTH;
context = canvas.getContext("2d");
image = context.getImageData(0, 0, SCREEN_WIDTH, SCREEN_HEIGHT);

setTimeout(loop, 10);
}
[/code]
PetrIn addition, your algorithm is a bit wrong, because you are not setting all pixels (the first pixel is not set).

Best regards
- Petr
ArtBITI am having a hard time with the comment parser...

You could also spare a few clock cycles by optimizing unnecessary multiplication in the loop.

Instead of:
[code]
while(--pixels) {
imageData[4*pixels+0] = r; // Red value
imageData[4*pixels+1] = g; // Green value
imageData[4*pixels+2] = b; // Blue value
imageData[4*pixels+3] = a; // Alpha value
}
[/code]
you could do:
[code]
var p;
while(--pixels) {
p = pixels << 2; // this was breaking the comment I guess
imageData[ p] = r; // Red value
imageData[++p] = g; // Green value
imageData[++p] = b; // Blue value
imageData[++p] = a; // Alpha value
}
[/code]
Selim@Petr and @ArtBIT: Yep there are many other ways to optimize this code :) This is just an exemple to point how this given optimisation works!

And yes the first pixel is not painted ... that wasn't intentional :/

p.s. @ArtBIT I removed your duplicated comment... sorry for the lame blog-engine :)
PetrHi Selim,

in assembly language there are some tricks how to fill rectangle really fast, but we are in javascript and if setting an array value means function call then the all optimizations are only minor. I mean that function call overhead is too much compared to one multiply or addition ;)

On the other side the js engine can create inner loop and in this case speed could be nearly the speed in C. I think that currently no browser can do that:)

Best regards
victoractually what you're doing saving 4 operations per loop. the dot (.) is an operator, like * or +, only probably way more expensive because your lvalue in the old code had to TWO symbol lookups whereas now you only one with no operator involved. Making the variable in the local scope also helped (a lot)
scupperAh, wow. Great work!
Boris SmusI was really confused by this post - I didn't understand why "detacting" and "attaching" should make any difference.

Are you sure that what you're seeing isn't just a result of saving a lot of image.data calls?

See also http://jsperf.com/pixel-pre-rendering
Björn Ali GöranssonHi,

Try to cache the "4*pixels" in a variable and see what happens in the results... :-)

Björn Ali
David B.Make use of the (new) image.data.buffer interface if supported. Map that into a 32bit typed array. Create a 32bit RGBA value using bit shifting and binary OR operator. Avoid using extra temp variables in the loop. Loop over pixels, not over each byte of each pixel. Minimizing the amount of operations per loop is the key. See jsperf-link above.
tags list