For those with short attention spans, here’s how you call the function:
map(enumerable, mapFunction, callback, numWorkers);
I wanted an easy way to divide up a parallelizable task with Web Workers, so I create a Worker enabled Map function for arrays and objects. It works just like the map function in your favorite functional languages, except that it executes asynchronously with callback. And of course, that it will go several times faster on multicore machines.
The function creates a pool of workers (32 by default) and divides the work up among them. It reassembles the results into a new object of the same type as the original. Array order is preserved.
Here’s the function:
function map(data, mapper, callback, numWorkers) {
// Support arrays & objects
var length = 0;
for(var d in data) { length++; }
var result = new data.constructor;
numWorkers = Math.min(numWorkers || 32, length);
var workers = [];
var messagesReceived = 0;
// Create the workers
for (var i=0; i < numWorkers; i++) {
workers[i] = new Worker("mapper.js");
workers[i].addEventListener('message', function(e) {
result[e.data.key] = e.data.value;
// Check if we have finished the job. This should probably be more robust.
if (++messagesReceived == length) { callback(result) };
}, false);
}
// Just send out all the tasks. The messages get queued by the browser.
// It would probably be better to queue up two or three tasks per worker (to minimize downtime)
// and add tasks to the queues as results come back.
var nextItem=0
for (var d in data) {
workers[nextItem++ % numWorkers].postMessage({key: d, value: data[d], mapper: "(" + String(mapper) + ")(value)"});
}
}
And here is the worker code. The worker is pretty bare bones, as you might have assumed.
// Minion
onmessage = function(e) {
var value = e.data.value;
postMessage({key: e.data.key, value: eval(e.data.mapper)});
}
Next, I’m thinking I’ll do a WebWorker implementation of MapReduce. I was thinking that I would use the syntax from CouchDB in the interest of standardization (emit
, in other words) but I am far from an expert on these things and would love to hear any feedback.
Comments
We moved off of Disqus for data privacy and consent concerns, and are currently searching for a new commenting tool.
the improvements you made since I last saw this are definitely very cool
Nice! I’m pretty excited to see what shiny new things we’ll be reaping from what appears to be a building momentum behind workers.
To add something useful: As you indicate in your code, right now it’s rather hard to write a good feedback mechanism to really tune performance. The UAs are all doing what’s optimal for them, and the spec is (intentionally, for now) underdefined wrt things like determining the maximum number of simultaneous workers. You might already follow the WHATWG mailing list and be well aware of this stuff, but, if not, you might find this collection of threads interesting:
http://lists.whatwg.org/htd…
Nice to see this post. Hopefully this also means you guys will stop teasing us with that Hive stuff soon, too =)
Looking over this I see I forgot one vital piece of information: the map function takes a single argument which is the value (not two argument which you might be expecting if you’re thinking Google MapReduce).
Very cool!
This reminds me of Metaworker – \”A javascript work parallelizer/distributor library for both HTML5 web workers and server-side nodejs\” – http://github.com/Maciek416….
I’ve also played with workers and created the pmrpc library – a library for RPC-style communication with web workers (and iframes/windows) – http://bit.ly/JMtkm.
Metaworker looks is pretty cool – similar to what I was planning to implement as far as map reduce goes. Except that my fallback would be single threaded execution, not server-side workers. Also, I’m kind of thinking that since you need the worker js files anyway, you may as well write the logic there instead of passing functions in. Still, maybe I’ll fork MW instead of rolling from scratch.
Thanks for the tip.
how do you guys think about using \”eval \” here ?