That Juju that you do (Part I: Bring the pain)

juju bottle

Benji’s blog post earlier this week gave you all some insight into what the Launchpad Yellow Squad has been doing recently in its attempt to parallelise the Launchpad test suite. One of the side effects of this is that we’ve been making quite a lot of use of Juju, and we thought it’d be nice to actually spell out what we’ve been doing.

The problem

We’re working to parallelise Launchpad’s test suite so that it doesn’t take approximately one epoch to get a branch from being approved for merging until it lands. A lofty goal, sure, and one that presents some interesting problems from the perspective of building an environment to test our work in. You see, Launchpad’s build infrastructure is a pretty complicated beast. It’s come a long way since the time when submitting a branch for merging meant sending an email to our PQM bot, which would then run the test suite and kick the branch out if it failed, but now it’s something of a behemoth.

Time for some S&M

We use Buildbot as our continuous integration system. There are two parts to Buildbot: the master and the slave. Broadly put, the slave is the part of Buildbot that is responsible for doing the actual work of compilation and running tests and the master is responsible for telling the slave when to do things. Each master can be responsible for several slaves. When it became obvious that we were going to need to essentially replicate our existing setup in order to test our parallelisation work, we considered asking Canonicals system administrators, in our sweetest tones, to give us a box upon which to do our testing work, but we spotted two reasons that this would be problematic:

  1. We didn’t actually know at the outset what the best architecture was for our project.
  2. Asking for a machine without knowing what you actually need is likely to earn you a look so old it could have come from an ammonite, at least if you have sensible sysadmins.

So instead, the obvious solution: use Amazon EC2. After all, that would allow us to play with different architectures without there being any huge cost in terms of physical resources. Moreover, we’d be able to have root access on the instances on which we were testing, which makes debugging such a complicated process so much easier.

However…

There was still a problem. How to actually set up the test instances, given that there are five of us spread between three timezones, that it takes a significant amount of time to set up a machine for Launchpad development, and finally that we don’t really want to leave EC2 instances running overnight if we don’t have to (because it’s expensive).

The sequence of steps we’d have to take to up an instance tends to look something like this:

  1. Launch a new EC2 instance (this happens pretty quickly, thanks, Amazon)
  2. Make sure that everyone’s public SSH keys are usable on that instance
  3. Run our Launchpad setup script(s) (this takes about an hour, usually).
  4. Install buildbot.
  5. Configure buildbot correctly as   master or slave.
  6. Run buildbot (or buildslave, if this is a slave) and make sure it’s hooked up correctly to the other type of buildbot.
  7. Get some code into buildbot and make it run the test suite.
As you can see, this is pretty long-winded and rather fragile; it’s very easy for us to miss out a step or misconfigure something, get confused and then be left with a broken instance and a bit of a headache. Now, you’d be quite right to argue that we could just write a checklist – or better yet, a shell script – to do a lot of the setup work for us. A good idea, true. But there’s a better way…
To be continued (or some other phrase that doesn’t sound so hammy that it almost goes “oink”)…
(Image by http://www.samcatchesides.com/ under a Creative Commons license)

3 Responses to “That Juju that you do (Part I: Bring the pain)”

  1. Jonathan Lange Says:

    Why did you need to use buildbot at all to do this? I mean, I guess it’s kind of kinky if you’re into that sort of thing, but couldn’t you use, well, anything else?

  2. Graham Binns Says:

    Why did you need to use buildbot at all to do this? I mean, I guess it’s kind of kinky if you’re into that sort of thing, but couldn’t you use, well, anything else?

    I don’t have all the history behind the reasoning – I may have to refer to people higher up the management chain – but as I understand it the brief for the project was: “Like our existing test infrastructure, but parallelised.” That meant using Buildbot.

    We already have quite a bit of knowledge around using and configuring Buildbot, since we’ve been using it for a couple of years now. It made sense to re-use what we already knew in doing this parallelisation work; we might transition to something else later (I believe there’s a Jenkins setup for running LP tests out there somewhere, for example) but for now we wanted to change as few variables in our mental model as possible.

  3. Launchpad Blog Says:

    […] « That Juju that you do (Part I: Bring the pain) […]

Leave a Reply