JavaScript Enumerable.Map() with WebWorkers

For those with short attention spans, here’s how you call the function:

map(enumerable, mapFunction, callback, numWorkers);

I wanted an easy way to divide up a parallelizable task with Web Workers, so I create a Worker enabled Map function for arrays and objects. It works just like the map function in your favorite functional languages, except that it executes asynchronously with callback. And of course, that it will go several times faster on multicore machines.

The function creates a pool of workers (32 by default) and divides the work up among them. It reassembles the results into a new object of the same type as the original. Array order is preserved.

Here’s the function:

function map(data, mapper, callback, numWorkers) {

  // Support arrays & objects
  var length = 0;
  for(var d in data) { length++; }
  var result = new data.constructor;

  numWorkers = Math.min(numWorkers || 32, length);
  var workers = [];
  var messagesReceived = 0;

  // Create the workers
  for (var i=0; i < numWorkers; i++) {
    workers[i] = new Worker("mapper.js");
    workers[i].addEventListener('message', function(e) {
      result[e.data.key] = e.data.value;
      // Check if we have finished the job.  This should probably be more robust.
      if (++messagesReceived == length) { callback(result) };
    }, false);
  }

  // Just send out all the tasks.  The messages get queued by the browser.
  // It would probably be better to queue up two or three tasks per worker (to minimize downtime)
  // and add tasks to the queues as results come back.
  var nextItem=0
  for (var d in data) {
    workers[nextItem++ % numWorkers].postMessage({key: d, value: data[d], mapper: "(" + String(mapper) + ")(value)"});
  }

}

And here is the worker code. The worker is pretty bare bones, as you might have assumed.

// Minion
onmessage = function(e) {
    var value = e.data.value;
    postMessage({key: e.data.key, value: eval(e.data.mapper)});
}

Next, I’m thinking I’ll do a WebWorker implementation of MapReduce. I was thinking that I would use the syntax from CouchDB in the interest of standardization (emit, in other words) but I am far from an expert on these things and would love to hear any feedback.

Comments

We moved off of Disqus for data privacy and consent concerns, and are currently searching for a new commenting tool.

  1. Nice! I’m pretty excited to see what shiny new things we’ll be reaping from what appears to be a building momentum behind workers.

    To add something useful: As you indicate in your code, right now it’s rather hard to write a good feedback mechanism to really tune performance. The UAs are all doing what’s optimal for them, and the spec is (intentionally, for now) underdefined wrt things like determining the maximum number of simultaneous workers. You might already follow the WHATWG mailing list and be well aware of this stuff, but, if not, you might find this collection of threads interesting:
    http://lists.whatwg.org/htd

    Nice to see this post. Hopefully this also means you guys will stop teasing us with that Hive stuff soon, too =)

  2. Looking over this I see I forgot one vital piece of information: the map function takes a single argument which is the value (not two argument which you might be expecting if you’re thinking Google MapReduce).

  3. Metaworker looks is pretty cool – similar to what I was planning to implement as far as map reduce goes. Except that my fallback would be single threaded execution, not server-side workers. Also, I’m kind of thinking that since you need the worker js files anyway, you may as well write the logic there instead of passing functions in. Still, maybe I’ll fork MW instead of rolling from scratch.

    Thanks for the tip.

Contact Us

We'd love to hear from you. Get in touch!

Mail

P.O. Box 961436
Boston, MA 02196