Open Source Infrastructure: Deploying jQuery with Ansible

Over the last ten years, the servers that power jQuery and its associated projects have evolved from a single shared webhost to a complex fleet of more than thirty virtual machines. Recently, I have been working with the jQuery Foundation to reel in this decade of organic growth by standardizing the configuration management of our infrastructure using Ansible. I’m writing this post to share some of the things I learned along the way.

Before I dive into the details, I’d like to take a minute or two to explain what configuration management is. To set the tone, see this quote from my colleague on the open web Greg Smith:

Configuration Management

It is pretty tough to reliably manage any web service without some kind of configuration management system. Even software as commonplace as WordPress requires a bevy of interlocking dependencies that can be dizzying to wrangle. Before you serve a single page you’ll need to install Nginx, MySQL, PHP, PHP-FPM, WordPress itself, and more. Once you have the right versions of each, they’ll need to be tweaked to work in concert using a sea of configuration files.

Years ago, we relied on shell scripts, random notes in text files, and even backups of entire /etc folders to handle this complexity. Even alone, keeping track of things for a single server was tricky. Trying to do the same on 30+ machines with multiple contributors… well, you can imagine.

The prospect of trying to replicate the state of one machine on another using the aforementioned methods was daunting to say the least. In the case of something benign like a planned upgrade, it meant hours or even days of troubleshooting. In the event of a server failure, it could mean pulling a stressful all nighter trying to repair a major outage.

Thankfully, there is an entire class of software built to handle this problem: configuration management tooling. While there are many open source solutions to choose from, they all have the same basic premise: you specify in code the state you desire a machine or group of machines to be in, and the tool will automate the process of bringing them into it.

Configuration Management at jQuery

When I joined the jQuery Infrastructure team about 5 years ago, there was no configuration management in place and hardly any documentation about how to set up or configure any of the services. So, I started writing puppet scripts to help manage things. This allowed us to reliably rebuild our servers when moving to different hardware, handle regular maintenance such as upgrading to the latest Linux distributions, and easily recover from an unfortunate incident where jQuery.com was hacked.

The jQuery infrastructure scripts were originally written for Puppet 2, and a few new versions have been released since then. Upgrading to Puppet 4 would have required a significant rewrite, and staying on an old unsupported version didn’t seem like a very good idea. This gave us the opportunity to evaluate other tools such as Chef and Ansible.

jQuery Migrates to Ansible

After weighing the options, the jQuery Infrastructure team chose to migrate to Ansible, primarily because of how incredibly easy it is to use. Ansible’s main goals are simplicity and ease-of-use and they’ve succeeded wildly on both fronts. Ansible manages machines in an “agentless” manner, which means it can orchestrate remote servers without having to install special client software on them (something I don’t miss about Puppet!). All you need is SSH access and you’re ready to go.

Ansible ships with several hundred built-in modules that can be used to perform just about any infrastructure management task you can imagine. Having now spent significant time working with both Puppet and Ansible, I can say that the Ansible configurations are much easier to read and understand at a glance, especially when it comes to the order of operations that will be executed (which can be very important).

Ansible Concepts

Variables are typically defined in YAML, but they can also be loaded from other familiar formats like JSON or INI.

Inventory files are a listing of hosts to manage, written in an .ini style syntax. In complex infrastructures, an inventory file can be dynamically generated. An inventory can group hosts (e.g. app servers and database servers), making it possible to change how Ansible treats specific machines when orchestrating them en masse.

Tasks can be defined directly within a playbook, or broken out into .yml files stored in a tasks directory. Each task describes a single action to be executed. Tasks are where most of the configuration work in Ansible is actually done. Tasks can use the variables defined in your configuration to template and iterate over lists or other simple data structures.

Handlers can also be defined in a distinct .yml file or directly in a playbook, and are a special task will only run if notified from other tasks. For example, if you perform multiple tasks that modify nginx, it would be ideal if you only restarted nginx when all of them were complete. Without the affordance of handlers, you’d be forced to individually check the status of each task before restarting nginx.

Roles are a collection of the previous concepts that can be grouped together for re-use.

Playbooks are the “entry point” for an Ansible run. Playbooks define “plays” which group together a list of hosts, variables, roles, tasks and handlers to run. You can define multiple tasks in a playbook, and even include other playbooks.

Where to Start?

When starting an infrastructure cleanup project like this, it’s best to begin by taking an inventory. For this project, I made a spreadsheet containing a list of all the hosts from our providers, Media Temple and DigitalOcean. That list, along with the DNS records in CloudFlare helped me to get a rough idea of what was still in use, and what services needed attention.

All told, we had 33 hosts running. Once I’d accounted for all of our servers, it became obvious that several were no longer needed. In a matter of hours I confirmed it was safe to spin down six of them.

Then, I roughly grouped the remaining servers into “known” (managed by our current puppet) and “mystery meat” (no configuration management) buckets, and decided to tackle the undocumented servers first. This meant figuring out how to ssh into the machines, hunting around to determine how they were configured, what they hosted (apache? nginx?), and where files were stored. During this reconnaissance, I communicated regularly with the jQuery team to tease apart the mysteries we’d left for ourselves.

Configuring Vagrant for Testing

As I triaged each server, I created a local development environment for it using Vagrant. Vagrant is a command line tool that can manage virtual machines. It also has affordances for integrating with most major configuration management tools. This made it very easy to create and destroy machines to test our new configurations without waiting for “real” servers to spin up in the cloud.

Providing local test environments is really important, both for ensuring your configuration management is working properly, and to make it easy for folks to get up and running while developing.

Handling Users

Being a bit overwhelmed by the amount of work ahead of me, I thought it best to focus on the initial setup that was applicable for any server in our network. To start, I needed to make sure that each user on the jQuery Infrastructure Team had SSH access, and that we could easily update SSH keys and sudo passwords.

Being new to this problem space in Ansible, I decided to see what approach others had taken in solving this standard administrative task. I quickly found a few pre-built roles on github, forked and added them to my configuration. ansible-sudoers helped me install the sudo package and ansible-users helped me install the users, ssh keys, and password hashes.

Using these roles I created a setup.yml playbook that looked like this:

# setup.yml - Setup a new virtual server as root
- name: "Initial setup - users, sudoers"
  hosts: all

  vars_files:
    - vars/sudoers.yml
    - vars/sysadmin.yml

  roles:
    - sudoers
    - users
# vars/sysadmin.yml
users:

  - username: s5fs
    name: Adam Ulvi
    groups: ['sudo']
    uid: 1100
    password: SOME_SHADOW_PASSWORD
    ssh_key:
      - "ssh-rsa MY_PUBLIC_KEY"
      - "ssh-rsa MY_ALTERNATE_KEY"

  - username: gnarf
    name: Corey Frang
    groups: ['sudo']
    uid: 1101
    password: SOME_SHADOW_PASSWORD
    ssh_key:
      - "ssh-rsa MY_PUBLIC_KEY"
      - "ssh-rsa MY_ALTERNATE_KEY"

When provisioning a server on DigitalOcean, you can configure it to add your SSH keys to the root user automatically. This takes care of the “chicken-and-egg” access issues of trying to start managing the server. Now, any time we bring up a new box, we can run the following:

ansible-playbook -u root -i new-server.jquery.net, setup.yml

TIP: no_log: true

The users module worked great, but I quickly got annoyed with how verbose the output was (it logged every user’s config with long ssh keys 3 times) and found the task parameter no_log: true which will only print ok, changed or error status. I opened a pull request to make that change, so feel free to use my fork in the meantime.

Locking Things Down a Bit

The machine now had users, but the sshd configuration still allowed access to the machine as root, something I wanted to disable. I also wanted to be sure users could only login with their SSH keys—no passwords allowed here. To handle these settings I created a securessh role:

# roles/securessh/tasks/main.yml
- name: sshd_config lines
  lineinfile:
  args:
    regexp: "^#?\s*{{ item.key }}"
    line: "{{ item.key }} {{ item.value }}"
    dest: /etc/ssh/sshd_config
    state: present
    validate: "/usr/sbin/sshd -t -f %s"
  with_items:
    - { key: 'PermitRootLogin',        value: 'no'}
    - { key: 'PasswordAuthentication', value: 'no'}
    - { key: 'MaxAuthTries',           value: "5"}
    - { key: 'LoginGraceTime',         value: "60"}
    - { key: 'MaxSessions',            value: "5"}
  notify: restart sshd
# roles/securessh/handlers/main.yml
- name: restart sshd
  service: name=ssh state=restarted

This will look through the config file, change any line starting with the configuration key that should be set and replaces it with the provided value. With this role added to my setup.yml playbook, the machine was in its base state.

Conclusion

The configuration I’ve shared above only scratches the surface of the work we did for the jQuery Foundation—I hope to post more in-depth coverage of the work we’re doing in the coming months.

All of that said, with no background in Ansible, I was able to convert/combine 8 “mystery meat” and 4 “known” hosts into 8 well-documented servers and deploy them to production in three weeks! Now that I have most of the patterns prepared for handling the various configuration scenarios required for jQuery, I’m confident the remaining servers will be a breeze.

Eventually, we plan to open-source all of jQuery’s infrastructure management scripts. If you are interested in this sort of thing and would like to volunteer some time, come find us in #jquery-infrastructure on freenode IRC!

Comments

Contact Us

We'd love to hear from you. Get in touch!

Phone

+1 617-283-2807

Mail

P.O. Box 961436
Boston, MA 02196