Author Archive

Fixing a simple bug in Launchpad – A screencast

Wednesday, June 20th, 2012

When I ran the Launchpad Development Clinics at the last UDS, I was asked if I would give a presentation on how to fix a simple bug in Launchpad. This sounded like a great idea – after all there’s a lot of parts to the Launchpad development process that apply to every single branch you ever write, so it seemed well worth the effort. Rather than a presentation, though, I figured a screencast would be of more use, so here is just such a screencast. If you’ve got any questions or comments, don’t hesitate to get in touch.

A story of a clinic

Friday, June 1st, 2012

You’ll remember that a while back, dear reader, we announced that we’d be running a couple of Launchpad Development Clinics at UDS (we called them Launchpad Clinics at the time, and I lost count of how many people commented that that sounded as though Launchpad was ill. It isn’t; it’s in the same rude health that it always was). We’ll come up with a better name next year!

Anyway, we made the announcement, put up the wiki page, saw a few names and bugs added to it, and didn’t really expect to be hugely busy. Indeed, when Laura, Matthew, Huw, Raphaël and I convened in one of the UDS meeting rooms for the first clinic, it was mostly empty except for the stragglers from the previous session.

Then a person appeared. Actually, it was two people squeezed into the skin of one person: Tim Penhey, who looks like he’s been chiseled out of granite and has a tendency to loom well-meaningly, like an avuncular greek pillar. We figured that since he’s an ex-Launchpadder, he didn’t count, and so went back to self-deprecatory joking whilst worrying that we’d massively miscalculated how many people would want to come to the clinics.

And then another person arrived. And another, and another. Soon, we’d gone from having a couple of attendees who really just needed to run ideas past a Launchpad core developer to having ten people who all needed questions answered, or a development instance spun up. After that, things get a bit fuzzy, because I was always answering someone’s question or being root on an EC2 instance for someone else. When Laura told us that our time was up I was, I have to confess, somewhat surprised.

Thursday’s session was a quieter affair, in part at least because we had to reschedule it at the last minute (UDS schedules are like quicksand, and if it weren’t for the amazing UDS admin team everyone would be thoroughly lost for much of the time), but there were still people there with bugs to be fixed and questions to be answered. I had preliminary discussions with Chris Johnston about adding API support to Blueprints, and worked with Ursula Junque on working out how to add activity logging to the same.

The upshot of the clinics is, I think, massively positive. There is a genuine development community out there for Launchpad, and people really are keen to make changes to the dear old Beast whilst the Launchpad core developers are working elsewhere or fixing things that are horribly complex (and usually not user-facing). For someone like me, who had been somewhat skeptical about the kind of response the clinics would receive (even though they were partly my idea), this is immensely gratifying news.

There are, of course, many things that we need to improve on, and many lessons that we can learn. People want to know how to fix a simple bug without having to come to a session at UDS, so I’m going to record a screencast of just such a procedure, right from finding the bug to working out where the fix lives, all the way through the coding and testing process, right up to the point of getting the branch reviewed and landed. Hopefully this will give everyone a great jumping-off point.

When we set out on this particular journey, one of the criteria I wrote down for considering the clinics a success was “we’ll want to do it again at the next UDS.” Well, I do. We did well; we can and will do better next time. Who’s with me?

Launchpad Clinic Attendees

 

That Juju that you do (Part II: A magical balm to sooth your ills)

Tuesday, April 3rd, 2012

In my previous post I talked about the pain of having to set up a testing environment for our parallelised test work, and how there were an awful lot of hoops to jump through in order to get something usable up and running. Now, dear reader, let me tell you a tale of strangeness and charms.

Enter Juju

If you’re not familiar with Juju, I’d urge you to pay a visit to the Juju website to learn more, but in brief, I’ll explain: Juju is an orchestration service for Ubuntu. Using Juju allows you to deploy services rapidly, scaling up or down as you need. Each service is contained within a Charm, which is at it simplest a set of scripts that ensure that a given Juju unit does what it’s supposed to do at the appointed time (for example: install and config_changed are two of the most common hook scripts for a charm to have). We realised that in order to make our life simpler when testing our parallelisation work we could develop a pair of Buildbot Charms (one for the master, one for the slave) which when deployed through Juju, and given the right set of configuration options, would give us a working Buildbot setup on which to test Launchpad.

More about the charms…

The charms need to be able to automatically configure themselves to talk to each other (this is usually managed in Buildbot by static configuration files). Luckily, Juju provides for exactly that situation with the notion of “relations”; one charm can declare that it provides a particular interface as part of a relation and another can say that it requires that interface in order to be able to be a part of that relation. For our Buildbot charms, we have the following in the master’s metadata.yaml:

provides:
  buildbot:
    interface: master

And in the slave:

provides:
  buildbot:
    interface: slave
requires:
  buildbot:
    interface: master

Each charm has a couple of hooks that deal with relation, named in the form buildbot-relation-* where * is joinedchanged or broken. These are run by Juju at the appropriate point in the process of connecting one instance to another. With all this in place, then, we can set up a working Buildbot environment by doing something like this:

$ juju bootstrap # create the Juju environment
$ juju deploy buildbot-master —config=/path/to/master/config.yaml # deploy the master charm
$ juju deploy buildbot-slave —config=/path/to/slave/config.yaml # deploy the slave charm
$ juju add-relation buildbot-slave buildbot-master

The last line – juju add-relation buildbot-slave buildbot-master– tells Juju to connect the buildbot slave node to the master node. The two then do a bit of a dance to configure each other properly (in fact, it’s mostly a case of the slave saying: “Hey, I’m here, what do you want me to do?” and the master passing back configuration instructions). Once this is all done, you have a working Buildbot master and slave, ready to accept work to build.

What have we discovered about Juju?

First and foremost, we’ve learned just how powerful Juju actually is. We’ve taken a fairly complex-to-configure build environment, for which we normally use dedicated machinery whose configuration is not to be touched without sysadmin blessing on pain of pain, and turned it into something that we can deploy with four or five commands and a couple of configuration files. Sure, Juju has its quirks and oddnesses, but when we’ve run across them the Juju development team has been amazingly helpful with workarounds or, more usually, bug fixes. The current version of Juju is implemented in Python, too, so we find it pretty easy to contribute fixes of our own if we need to.

Where can I find out more?

As I said above, if you want to know more about Juju, you can check out the Juju website. If you want to take a look at our Buildbot charms and how we’ve built our hooks (they’re written in Python because that happens to be our language of choice, but in fact they can be written in anything so long as they’re executable), you can grab our code from Launchpad:

  • For the master: bzr branch lp:~yellow/charms/oneiric/buildbot-master/trunk buildbot-master
  • For the slave: bzr branch lp:~yellow/charms/oneiric/buildbot-slave/trunk buildbot-slave

If you’ve got questions about Juju in general, the folks in #juju on Freenode are always tremendously helpful. If you’ve got any questions about our charms, ask them in the comments here and I’ll do my best to answer them.

 

((Image by http://www.samcatchesides.com/ under a Creative Commons license)

That Juju that you do (Part I: Bring the pain)

Friday, March 30th, 2012

juju bottle

Benji’s blog post earlier this week gave you all some insight into what the Launchpad Yellow Squad has been doing recently in its attempt to parallelise the Launchpad test suite. One of the side effects of this is that we’ve been making quite a lot of use of Juju, and we thought it’d be nice to actually spell out what we’ve been doing.

The problem

We’re working to parallelise Launchpad’s test suite so that it doesn’t take approximately one epoch to get a branch from being approved for merging until it lands. A lofty goal, sure, and one that presents some interesting problems from the perspective of building an environment to test our work in. You see, Launchpad’s build infrastructure is a pretty complicated beast. It’s come a long way since the time when submitting a branch for merging meant sending an email to our PQM bot, which would then run the test suite and kick the branch out if it failed, but now it’s something of a behemoth.

Time for some S&M

We use Buildbot as our continuous integration system. There are two parts to Buildbot: the master and the slave. Broadly put, the slave is the part of Buildbot that is responsible for doing the actual work of compilation and running tests and the master is responsible for telling the slave when to do things. Each master can be responsible for several slaves. When it became obvious that we were going to need to essentially replicate our existing setup in order to test our parallelisation work, we considered asking Canonicals system administrators, in our sweetest tones, to give us a box upon which to do our testing work, but we spotted two reasons that this would be problematic:

  1. We didn’t actually know at the outset what the best architecture was for our project.
  2. Asking for a machine without knowing what you actually need is likely to earn you a look so old it could have come from an ammonite, at least if you have sensible sysadmins.

So instead, the obvious solution: use Amazon EC2. After all, that would allow us to play with different architectures without there being any huge cost in terms of physical resources. Moreover, we’d be able to have root access on the instances on which we were testing, which makes debugging such a complicated process so much easier.

However…

There was still a problem. How to actually set up the test instances, given that there are five of us spread between three timezones, that it takes a significant amount of time to set up a machine for Launchpad development, and finally that we don’t really want to leave EC2 instances running overnight if we don’t have to (because it’s expensive).

The sequence of steps we’d have to take to up an instance tends to look something like this:

  1. Launch a new EC2 instance (this happens pretty quickly, thanks, Amazon)
  2. Make sure that everyone’s public SSH keys are usable on that instance
  3. Run our Launchpad setup script(s) (this takes about an hour, usually).
  4. Install buildbot.
  5. Configure buildbot correctly as   master or slave.
  6. Run buildbot (or buildslave, if this is a slave) and make sure it’s hooked up correctly to the other type of buildbot.
  7. Get some code into buildbot and make it run the test suite.
As you can see, this is pretty long-winded and rather fragile; it’s very easy for us to miss out a step or misconfigure something, get confused and then be left with a broken instance and a bit of a headache. Now, you’d be quite right to argue that we could just write a checklist – or better yet, a shell script – to do a lot of the setup work for us. A good idea, true. But there’s a better way…
To be continued (or some other phrase that doesn’t sound so hammy that it almost goes “oink”)…
(more…)

You can’t be too helpful: Better handling of large blobs by +filebug

Thursday, March 4th, 2010

Imagine the scene, dear reader. You’re running the latest Ubuntu pre-release, dilligently testing that everything works the way it ought. An application crashes horribly; Apport pops up to tell you that it spotted the crash – would you like to report a bug?

“Of course I would,” you cry, “I am a desktop testing hero!” And so Apport does its thing and takes you to Launchpad to file the bug report. And then Launchpad times out.

At this point, if you’re like me, you might shout a bit. You refresh and refresh with all your might but still Launchpad will only give you an error page. Finally, defeated, broken, you close your browser and shuffle off into the corner of your room, where you bury yourself under the mountain of discarded CD-Rs that contain daily ISOs of Ubuntus past and sob into your coffee.

Okay, so maybe I’ve exaggerated it a bit, but that doesn’t change the basic fact that timeouts on the +filebug page when you’re filing a bug via Apport are intensely, soul-destroyingly annoying. They’re annoying to us in the Launchpad Bugs team too, because we see them in our error reports. They’re so annoying, dear reader, that we’re moved to hyperbole when writing about them for the official Launchpad blog.

The problem with processing data supplied by Apport has always been one of scale. Originally the extra data that Apport would upload to Launchpad wouldn’t be massive, maybe hitting 10MB if the application was particularly busy. Because of this, Launchpad simply processed those data synchronously whilst loading the bug-filing form for you, so that the data you’d uploaded would be included with the bug.

Of course, that approach doesn’t scale very well, and recently we’ve been seeing the data blobs that Apport uploads hit ~100MB in size. That’s far too big for Launchpad to handle in a timely manner whilst doing all the additional work of rendering the bug filing form for you, so in those circumstances it invariably times out and you get that ever-annoying error page.

This was an interesting problem to solve, and we came up with a number of different possible solutions. One viable solution for this, which we eventually decided not to implement, involved having a separate request for processing the extra bug data and then loading the data into the filebug form with AJAX. We discarded that idea because there was always the chance that that request would time out, leaving us in an only slightly better position than we were already.

There was some discussion on the Launchpad developers mailing list about whether we could just defer loading the extra data into the bug until after it had been filed, but we quickly realised that not only could the extra data carry an initial subject for the bug, but it could also indicate that the bug should be filed as private, something which currently can’t be done via the Launchpad web UI.

The solution we chose to implement, which we’ve now rolled out to the Launchpad edge and production servers, is to have a queue of blobs waiting to be processed. We already have quite a robust job-processing system built into the Launchpad codebase, which we use for creating the diffs in merge proposals and calculating bug heat, amongst other things. Adding support for processing uploaded blobs was quite simple, since the existing blob parsing code was well documented. The blob processing jobs are picked up and run by a cron script that runs every minute or so, and the data retrieved from the blobs are stored with the original processing job as a JSON-formatted string. When you hit the +filebug page it checks to see if the relevant blob has been processed. If not, it waits until processing is complete. Once processing is complete the serialized data are loaded into the +filebug page for use later on.

The advantage to you as a user, then, is that you should never again see a timeout on the +filebug page due to the size of the blob that Apport has uploaded. With the upcoming release of Ubuntu Lucid, we’re hopeful that this will make a real difference to those people testing the pre-release versions of the distro.

Inline dupe-finding: an exercise in pain reduction

Thursday, December 10th, 2009

For the last million years1 or so I’ve been working on a cool new feature for Launchpad Bugs: an inline, AJAXified, asynchronous dupe finder.

For quite some time now people have encountered timeouts or long response times when trying to file bugs, particularly when they enter a long bug summary or the project that they’re filing the bugs on has a lot of bugs through which Launchpad has to search in order to be able to find possible duplicates. The upshot was that whenever a timeout occurred people were unable to file a bug and would have to back up and start again. Needless to say, this was frustrating for all involved.

The new inline dupefinder, which you’ll now find on the “Report a bug” page of any project in Launchpad (when viewed on edge.launchpad.net) is designed to stop this from being a problem, or at least to reduce the problem to a more manageable level and stop it from getting in peoples’ way. It does this in two ways:

  1. The inline list of duplicates is much quicker to render than a full Launchpad page.
  2. If the search for duplicates times out for some reason you’ll still be able to file a bug.

Here’s the catch: we need your help. Launchpad’s development cycle this month is very short due to the approaching year-end holiday period, so we need to get as much testing done on this as possible. Check out the dupe finder, see if it works for you and, most importantly, report a bug if it doesn’t.

One last thing: at the time of writing, the inline dupe finder only works for projects (like Launchpad Bugs), not for packages or project groups. We’ll hopefully be enabling it for project groups today and with a bit of luck for packages, too. We started off with projects only because it’s the simplest implementation of the concept and it gives us a good base to test from.

Thanks in advance for your help. Let’s make Launchpad awesome together!

1 This might be an exaggeration.