Inclusive Technology Consulting

This post is the second in a three-part series describing our investigations into scalability for a second screen application we built for PBS. You can read the introduction here. This guide assumes you have a production server up and running. If you need help getting there, check out the final post in the series for a guide to configuring a Node.js server for production.

If you read my harrowing tale on stress testing, you may be interested in conducting similar research for your own nefarious purposes. This post is intended to help others get a leg up stress testing Node.js applications–particularly those involving Socket.io.

I’ll cover what I consider to be the three major steps to getting up-and-running:

Building a client simulator
Distributing the simulation with Amazon EC2
Controlling the simulation remotely

The Client Simulator

The first step is building a simulation command-line interface. I’ll necessarily need to be a little hand-wavy here because there are so many ways you may have structured your app. My assumption is that you have some JavaScript file that fully defines a Connection constructor and exposes it to the global scope.

Simulating many clients in Node.js. Lucky for you, Socket.io separates its client-side logic into a separate module, and this can run independently in Node.js. This means that you are one ugly hack away from using the network logic you’ve already written for the browser in Node.js (assuming your code is nice and modular):

// Hack to get client code running on server (evaluate that code in the
// global scope).
var clientFilePath = __dirname + "/path/to/your/network/module.js";
var clientFileContents = fs.readFileSync(clientFilePath).toString();

this.io = require("socket.io-client");
eval(clientFileContents);

Now you can instantiate an arbitrary number of connections iteratively. Be sure to spread these initializations out over time, like so:

// Interpreted from the command line; more on this below
var clientCount = argv.c;

var heartbeatInterval = 25 * 1000;
var idx = 0;
var intervalID;

var makeConnection = function() {
  (new this.liveMap.Connection()).connect();

  idx++;
  if (idx === clientCount) {
    clearInterval(intervalID);
  }
};

intervalID = setInterval(makeConnection, heartbeatInterval/clientCount);

As I discussed in the previous post in this series, failing to take this precaution will result in heartbeat synchronization. I guess that sounds cool, but it can have unpredictable results on your measurements.

If your connection module is using Socket.io’s top-level io.connect method to initiate a connection, you’ll have to make an additional modification. By default, Socket.io will prevent the simulator from making multiple connections to the same server. You can disable this behavior with the "force new connection" flag, as in io.connect({ "force new connection": true });.

The above example makes use of a clientCount variable. This is just one simulation variable you will likely want to manipulate at run time. It’s a good idea to support command-line flags for changing these values on the fly. I recommend using the “optimist” library to help build a usable command-line interface.

Date collection. For another perspective on test results, you may decide to script your simulator to collect performance data. If so, take care in the method of data collection and reporting you choose. Any extraneous computation could potentially affect the simulator’s ability to simulate clients, and this will lead to unrealistic artifacts in your test results.

For example, you may want to store the time that each “client” receives a given message. A naive approach would look something like this:

connection.on("message", function() {
  fs.writeFileSync("results.txt", connection.id + "," + new Date().getTime());
});

Most Node.js developers maintain a healthy fear of any function call that starts with fs. and ends with Sync, but even the asynchronous version could conceivably impact your simulator’s ability to realistically model hundreds of clients:

connecton.on("message", function() {
  fs.writeFile("results.txt", connection.id + "," + new Date().getTime());
});

To be safe, I recommend storing test statistics in memory and limiting the number of disk writes. For instance:

var stats = [];
connection.on("message", function() {
  stats.push([connection.id, new Date().getTime()]);
});

Now, you could have the simulator periodically write the data to disk and flush its in-memory store:

setInterval(function() {
  fs.writeFile("results.txt", JSON.stringify(stats));
  stats.length = 0;
}, 5000);

…but you still run the risk of simulation artifacts from conducting I/O operations in the middle of the procedure. Instead, consider extending your simulator to run a simple web server that responds to HTTP GET requests by “dumping” the data it has collected:

var app = require("express")();
app.listen(80, "127.0.0.1");
app.get("/dump", function(req, res) {
  res.send(JSON.stringify(stats));
  stats.length = 0;
});

This approach allows for a clean separation of data gathering and data reporting. You can run your simulation for some period of time, pause it, and then run the following command to get at your statistics:

$ curl localhost/dump > results.txt

Of course, this also introduces the overhead of running a web server. You will have to weigh this against the performance impact of immediate I/O operations, and this is largely dependent on the amount of data you intend to collect (both in terms of size and frequency).

Scripting clients. In some cases, having “dumb” clients that merely connect and wait for data will be enough to test the system. Many applications include complex client-to-client communication, so you may need to script more nuanced behavior into the simulator. That logic is extremely application-specific which means I get to “leave it as an exercise to the reader” to build a script that makes sense for your application.

Although you now have a mechanism to simulate a large number of clients on your local machine, this approach would be far too unrealistic. Any tests you run will likely be bottlenecked by your system’s limitations, whether in terms of CPU, memory, or network throughput. No, you’ll need to simulate clients across many physical computers.

Distributing with EC2

Taking a hint from Caustik’s work, I began researching details on using Amazon Web Service‘s Elastic Compute Cloud (EC2) to distribute my new simulation program. As it turns out, Amazon offers a free usage tier for new users. The program lasts for a year after initial sign up and includes (among other things) 750 hours on an EC2 “Micro” instance. This is enough to run one instance all day, everyday, or (more importantly) 20 instances for one hour a day. All this means that you can run your simulation on 20 machines for free!

The first thing you’ll need is an Amazon Machine Image (or “AMI” for short). This is a disk image from which each machine will boot. Amazon hosts an entire ecosystem of user-made AMI’s, and you can likely find one that suits your needs. I recommend creating your own since it is so easy and you can never be sure exactly what might be running on a home-made AMI. Just select a free AMI authored by Amazon to begin. We’ll be using this as a base to create our own custom AMI to be cloned as many times as we wish.

AMI Selection You can use many of the AMI’s offered by Amazon at no charge under the AWS free tier.

As part of this process, you will need to create a key pair for starting this AMI. Hold on to this: you’ll need it to launch multiple instances in the next step.

By default, AMI’s do not support remote connections though SSH. This is the underlying mechanism of our administration method (more on this below), so you’ll need to enable it. Use the AWS Management Console to create a custom security group that allows connections on port 22 (to make this easy, the interface has a pre-defined rule for SSH).

Creating a security group

In this example, the security group named “bees” has been configured to support SSH access.

Next, you’ll want to customize the AMI so that it can run your simulation immediately after being cloned. You can install Node.js from your operating systems software repository, but please note that the version available may be well behind latest. This has historically been the case with Debian and Ubuntu, for instance, which currently supply version 0.6.19 despite the latest stable release being tagged at 0.8.16. Installing Node.js from source may be your best bet (if you need guidance on that, check out this page on the Node.js wiki).

While you could use a utility like scp to copy your simulation code to the AMI, I recommend installing your version control software (you are using a VCS, right?) and fetching from your repository. This will make it much easier to update the simulation later on.

Currently, your simulation script is likely tying up the terminal. When connected over SSH, this behavior would prevent you from issuing any other commands while the simulation runs. There are a number of ways you can “background” a process, thus freeing up the command line. Since you have already installed Node.js, I recommend simply using the “forever” utility from Nodejitsu. Install it globally with:

npm install -g forever

Now, when connected remotely, you can effectively “background” the following command:

node mySimulator.js

by instead running:

forever start mySimulator.js

When you’re done, just run:

forever stopall

to cancel the simulator.

Now that you’re able to run the simulation on this customized setup, it’s time to save it as your own personal AMI. With your EC2 instance running, visit the EC2 Dashboard and view the running instances. Select the instance you’ve been working with and hit “Create Image”. In a few minutes, your custom AMI will be listed in the dashboard, along with a unique ID. You’ll need to specify this ID when starting all your simulator instances, so make note of it.

Listing of custom AMIs

Administration

The last step is building a means to administer all those instances remotely. You’ll want to make running a test as simple as possible, and it can certainly be easier than manually SSH’ing into 20 different computers.

There’s an old saying that goes, “There’s nothing new under the sun.” Besides sounding folksy and whimsical, it has particular relevance in the open-source world: it means that someone else has already done your job for you.

In this case, “someone else” is the fine folks over at the Chicago Tribune. Back in 2010, they released a tool named “Bees with Machine Guns”. This was designed to administer multiple EC2 instances remotely. There’s one catch, though: it was built specifically for the Apache Benchmark tool.

In order to use it to control your Node.js client simulator, you’ll need a more generalized tool. I made a quick-and-dirty fork of the project to allow for running arbitrary shell commands on all the active EC2 instances (or in beeswithmachineguns lingo, “all the bees in the swarm”). I’m no Pythonista, though, and I encourage you to fork it and help me clean it up!

There are a number of flags you will need to specify in order to get your bees off the ground. These include the key pair for your custom AMI and the name of the custom security group you created. Type bees --help for details.

Here’s an example of how you might start up 10 bees:

$ bees up --key bocoup-pbs \
    --group bees --zone us-east-1a \
    --instance ami-0123456 --login ubuntu \
    --servers 10

Bear in mind that you are on the clock for as long as these instances are running, regardless of whether or not you are actually conducting your tests. Even with the free usage tier, you run the risk of being billed for long-running instances. Once you are finished, do not forget to deactivate the EC2 instances!

$ bees down

To play it safe, I recommend keeping the EC2 Management Console open during this procedure. This lets you keep an eye on all currently-running instances and can help assure you that yes, they really are all off.

You can invoke your client simulator using the exec command along with the forever utility we built in to the AMI:

$ bees exec - "forever start mySimulator.js"

And if you ever want to change your simulator logic on the fly, you can use the same API to pull the latest code:

$ bees exec - "git fetch origin && git checkout origin/master"

Just remember that the next time you run your tests, you will be re-initializing machines from your custom AMI, so these on-the-fly patches won’t stick.

With Great Power…

Summing up, you now have a complete stress-testing framework ready to go:

A command-line tool for simulating any number of clients
A group of computers that will run this tool simultaneously
A method to control those computers from the comfort of your swivel chair/throne/park bench

Now (finally) you’re ready to begin your stress test in earnest. If all this is new to you, you might not have a production server to test. Sure, you could test your development machine, but we’re striving for realism here. Not to worry: in the final part of this series, I detail my approach to creating a production-ready Node.js server.

Realtime Node.js App: Stress Testing Procedure

The Client Simulator

Distributing with EC2

Administration

With Great Power…