Archive for the ‘General’ Category

Faster deployments

Friday, February 3rd, 2012

CheetahBack in September, we announced our first fastdowntime deployment. That was a new way to do deployment involving DB changes. This meannt less downtime for you the user, but we were also hoping that it would speed up our development by allowing us to deliver changes more often.

How can we evaluate if we sped up development using this change? The most important metric we look at when making this evaluation is cycle time. That’s the time it takes to go from starting to make a change to having this change live in production.  So before fastdowntime, our cycle time was about 10 days, and it is now about 9 days. So along the introduction of this new deployment process, we cut 1 day off the average, or a 10% improvement. That’s not bad.

But comparing the cumulative frequency distribution of the cycle time with the old process and the new will give us a better idea of the improvement.

Cycle time chart

On this chart, the gap between the orange (fastdowntime deployment) and blue (original process) lines shows the improvement to us.  We can see that more changes were completed sooner. For example, under the old process about 60% of the changes were completed in less than 9 days whereas about 70% were completed under the same time in the new process. It’s interesting to note that for changes that took less than 4 days to complete or that took more than 3 weeks to complete, there is no practical difference between the two distributions. We can explain that by the fact that things that were fast before are still fast, and things that takes more than 3 weeks would usually have also encountered a deployment point in the past.

That’s looking at the big picture. Looking at the overall cycle time is what gives us confidence that the process as a whole was improved. For example, the gain in deployment could have been lost by increased development time. But the closer picture is more telling.

Deployment cycle time chart

The cycle time charted in this case is from the time a change is ready to be deployed until it’s actually live. It basically excludes the time to code, review, merge and test the changes. In this case, we can see that 95% of the changes had to wait less than 9 days to go live under the new process whereas it would take 19 days previously to get the same ratio. So an
improvement of 10 days! That’s way more nice.

Our next step on improving our cycle time is to parallelize our test suite. This is another major bottleneck in our process. In the best case, it usually takes about half a day between the time a developer submits their branch for merging until it is ready for QA on qastaging. The time in between is passed waiting and  running the test suite. It takes about 6 hours to our buildbot to validate a set of revisions. We have a project underway to run the tests in parallel. We hope to reduce the test suite time to under an hour with it. This means that it now would be possible for a developer to merge and QA a change on the same day! With this we expect to shave another day maybe two from the global cycle time.

Unfortunately, there are no easy silver bullets to make a dent in the time it takes to code a change. The only way to be faster there would be to make the Launchpad code base simpler. That’s also under way with the services oriented architecture project. But that will take some time to complete.

Photo by Martin Heigan. Licence: CC BY NC ND 2.0.

How to do Juju – Charming oops-tools

Thursday, February 2nd, 2012

Recently the Launchpad Red Squad and Product Team started working on a new cloud project. As part of that project we’ll be using Juju, a tool that helps you easily deploy services on the cloud.

As an opportunity to learn more about how Juju works, I wrote a charm to deploy oops-tools, an open source Django application that helps visualize and aggregate error reports from Launchpad, on Amazon’s EC2 and Canonical’s private Openstack cloud.

You might be asking, what’s a charm? Charms are basically instructions on how to deploy services, that can be shared and re-used.

Assuming you already have a bootstrapped juju environment, deploying oops-tools using this charm is as easy as:

$ juju deploy --repository=. local:oops-tools
$ juju deploy --repository=. local:postgresql
$ juju add-relation postgresql:db oops-tools
$ juju expose oops-tools

That’s it! With just a few commands, I have an instance of oops-tools up and running in a few minutes.

Under the hood, the oops-tools charm:

  • starts two Ubuntu instances in the chosen cloud provider, one for the webserver and another for the database server
  • downloads the latest trunk version of oops-tools and its dependencies from Launchpad
  • configures oops-tools to run under Apache’s mod_wsgi
  • configures oops-tools to use the database server

There’s still work to do, like add support for RabbitMQ (oops-tools uses rabbit to provide real-time error reports), but this initial iteration proved useful to learn about Juju and how to write a charm. As it is, it can be used by developers who want to hack on oops-tools and can be easily changed to deploy oops-tools in a production environment.

If you’d like to give it a try, you can get the charm here: https://code.launchpad.net/~charmers/charms/oneiric/oops-tools/trunk

Enjoy!

 

 

(“Harry Potter’s The Standard Book of Spells” by Craig Grobler, licensed under a CC BY-NC-ND license)

Fighting fire with fire – Changes to bug heat

Monday, January 30th, 2012


 bug heat storm trooper candle

We’re making changes to the way that bug heat is calculated and displayed in Launchpad.  From 6th February, bug heat will no longer age/degrade, and the active flame counter symbol will be replaced by a number, next to a static flame.  Here’s why.

Bug heat ageing is the cause of a wide range of timeouts in Launchpad. Every bug’s heat has to be recalculated and updated every day using a series of complex calculations, and when there are around 1 million bugs reports to track, that’s a lot of pressure on the system, consuming a significant chunk of resources.  Turning off bug aging is the simplest way to solve this issue.

 

new bug heat image

 Display

The flame counter symbol, although adding some visual flair (and flare), also needs to update every time the bug age recalculations are made.  The continual stream of updates to the bug rows also results in poor search index performance.

We’ll still have a flame symbol, however it’ll be static, with the bug heat number next to it. Although not as visually dynamic, it’ll be easier to work out bug heat scores more exactly, at a glance.

Although I’m sure some of us will miss this little Launchpad feature, less timeouts is good news for everyone.

 

 

(“Happy and safe birthday” by Stéfan, licensed under a CC:BY-NC-SA license)

New feature – Customise your bug listings

Monday, January 23rd, 2012

Custom Bug Listings

Over the past few months the Launchpad Orange Squad has been working to make it easier to get the information that matters to you from bug listings.

A lot of you have said in the past that you’d like to be able to filter bugs in a way that works best for you. Hopefully this new feature, with its customisable functionality should help with this goal, filling your screen with more of what you want to see.

Custom bug listings green bug

Features

You can now sort bugs by criteria such as name, importance, status and age. You can switch on the criteria that you use most and turn off criteria that you don’t use. So if you always like to see bug age, but aren’t interested in heat, you can switch on age and switch off heat, and so on.

bug column screen shot

Display

We’ve also redesigned how bug listings are displayed – fitting more information into each bug listing, and adding sort options such as bug age, assignee, reporter, and tags.

You can put your results into ascending or descending order without having to reload the page, and you’ll be able to save your preferred layout, so your settings will be saved for the next time you need to look over your bugs.

User research

This was my first main project since joining the Launchpad team back in November as the new Usability & Communications Specialist. User research has played an important part in how we’ve defined the feature and the decisions the team has made to improve the display, wording and functionality.

A number of you took part in one to one interviews, at group sessions at UDS-P and by taking part in an online survey. Thanks to everyone involved – what you told us has really helped to make this feature a more user-friendly experience. Some of our user research results (link) are already available online, with more being added soon. We’ll be carrying out some further tests in the weeks ahead, so please get in touch if you’d like to get involved.

Bugs

Every new feature has teething problems, and custom bug listings is no different. We still have a number of bugs that need tweaking, so please bear with us, and file any bugs if you spot anything that’s still out there.

New approaches to new bug listings

Thursday, December 15th, 2011

The new bug listings listings were the first time my squad, the Orange Squad, had a chance to work on some really nice in-depth client-side UI since our squad was formed. Not only were we implementing the feature, we wanted to lay groundwork for future features.  Here are some of the new things we’ve done.

Synchronized client-side and server-side rendering

Early on, we decided to try out the Mustache template language, because it has client and server implementations. Although we wanted to make a really responsive client-side UI, we also wanted to have server-side rendering, so that we’re not broken for web crawlers and those with JavaScript disabled. Being able to use the same template on the server and the client seemed ideal, since it would ensure identical rendering, regardless what environment the rendering was done in.

It’s been a mixed bag. We did accomplish the goal of using a single template across client and server, but there are significant bugs on both sides.

The JavaScript implementation, mustache.js, is slow on Firefox. Rendering 75 rows of data takes a noticeable length of time. If you’re a member of our beta team, you can see what I mean. Go to the bugs page for Launchpad itself in Firefox. Click Next. Now click Previous. This will load the data from cache, but it still takes a visible length of time before the listings are updated (and the Previous link goes grey).

mustache.js also has bugs that cause it to eat newlines and whitespace, but those can be worked around by using the appropriate entity references, e.g. replacing “\n” with “
”

The Python implementation, Pystache, does not implement scoping correctly. It is supposed to be possible to access outer variables from within a loop. On the client, we use this to control the visibility of fields while looping over rows. On the server, we have to inject these control variables into every row in order for them to be in scope.

We needed a way to load new batches of data. Mustache can use JSON data as its input. Launchpad’s web pages have long had the ability to provide JSON data to JavaScript, but Brad Crittenden and I recently added support for retrieving the same data without the page, via a special ++model++ URL. This seemed like the perfect fit to me, and it’s turned out pretty well. Using the ++model++ URL rather than a the Launchpad web service means the server-side rendering can tightly parallel the client-side rendering.  Each uses the same data to render the same template.  It also means we don’t have to develop a new API, which would probably be too page-specific.

Client-side Feature Flags

While in development, the feature was hidden behind a Feature Flag. But at one point, we found we wanted access to feature flags on the client side, so we’ve now implemented that.

History as Model

We wanted users to be able to use their browser’s Next and Back buttons in a sensible way, especially if they wanted to return to previous settings. We also wanted all our widgets to have a consistent understanding of the page state.

We were able to address both of these desires by using YUI’s History object as a common model object for all widgets.  History provides a key/value mapping, so it can be used as a model.  That mapping gets updated when the user clicks their browser next and back buttons.  And when we update History programmatically, we can update the URL to reflect the page state, so that the user can bookmark the page (or reload it) and get the same results.  Any update to History, whether from the user or from code, causes an event to be fired.

We’re able to boil the page state down to a batch identifier and a list of which fields are currently visible. The actual batches are stored elsewhere, because History isn’t a great place to store large amounts of data.  For one thing, there are limits on the amount of data that can be stored.  For another, the implementation that works with old browsers, HistoryHash, can’t store anything more complex than a string as a value.

All our widgets then listen for events indicating History has changed, and update themselves according to the new values in the model.

Summing up

It’s been an interesting feature to work on, both because of the new techniques we’ve been able to try out, and because we’ve been closely involved with the Product team, trying to bring their designs to life.  We haven’t quite finalized it yet, but I’m going on leave today, so I wanted to let you know what we’ve all been up to.  Happy holidays!

Legacy, performance, testing: 6 months of new critical bugs analyzed

Friday, December 9th, 2011

Bugs

The Launchpad maintenance teams have been working since the beginning of the year at reducing our Critical bugs count to 0. Without much success this far. The long term trend keeps the backlog at around 300.  And it’s not because we haven’t been fixing these. Since the beginning of the year, more than 800 Critical bugs were fixed, but more than 900 were reported 🙁

So I investigated what was the source of all these new critical bugs we were finding. A random sample of 50 critical bugs filed were analyzed to see where and why they were introduced. The full analysis is available as a published Google document.

Here are the most interesting findings from the report:

  • Most of the new bugs (68%) are actually legacy issues lurking in our code base.
  • Performance and spotty test coverage represents together more than 50% of the cause of our new bugs. We should refocus maintenance on tackling performance problems, that’s what is going to bring us the most bang for the bucks (even though it’s not cheap).
  • As a team, we should increase our awareness of testing techniques and testing coverage. Always do TDD, maybe investigate ATDD to increase the coverage and documentation our the business rules we should be supporting.
  • We also need to pay more attention to how code is deployed, it’s now very usual for scripts to be interrupted, and for the new and ancient version of the code to operate in parallel.

Another way of looking at this is that Launchpad represents a very deep mine of technical debt. We don’t know how exactly deep the mine is, but we are going to find existing Critical issues until we hit the bottom of the mine. (Or until we finish refactoring enough of Launchpad for better testing and performance. That’s what the SOA project is all about.)

In the mean time, we should pay attention to regressions and fallouts, (those are really the new criticals) to  make sure that we aren’t extending the mine!

Photo by Brian W. Tobin. Licence: CC BY-NC-ND 2.0.

Custom bug listings – have your say

Tuesday, December 6th, 2011

Our custom bug listings beta has been up and running for just over a week now – thanks to everyone in the Launchpad beta testers group that have tried it out, and thank you for all your valuable feedback and comments. If you haven’t tried it yet, you can get access by joining our beta team here:  https://launchpad.net/~launchpad-beta-testers

We want to improve how the default information is displayed to make this tool work better, so we’ve put together a super-quick survey to find out:

– What information about a bug you most want to see in bug listings

– What the default ‘order by’ options should be

– If you’d like to see any other ‘order by’ options.

These three questions should only take a few minutes to complete, but they’ll add real value to our work redesigning how bug listings appear and function. Here’s the link if you’d like to take part

Improved performance for personal code pages

Thursday, November 10th, 2011

Edit 2011-11-15 08:18 UTC: The problem is now fixed and we’ve re-enabled the new menu.

Edit 2011-11-11 13:42 UTC: We’ve temporarily disabled the new menu while we fix some unfortunate side effect.

We’ve just deployed a new, simplified version of the branch menu displayed on the right hand side of personal code pages (e.g. personal page for the Launchpad team). It looks like this:

Old menu

New menu

Calculating the number of branches took way too much time for people/teams with a huge number of branches (e.g. https://code.launchpad.net/~ubuntu-branches), up to the point that they were getting timeouts.

The new design, along with optimisations we’ve made to the database queries, should improve performance for everyone.

Daily builds of huge trees

Thursday, November 10th, 2011

We’ve just upgraded Launchpad’s builder machines to Bazaar 2.4. Most importantly, this means that recipe builds of very large trees will work reliably, such as the daily builds of the Linaro ARM-optimized gcc. (This was bug 746822 in Launchpad).

We are going to do some further rollouts over the next week to improve supportability of recipe builds, support building non-native packages, handle muiltiarch package dependencies, improve the buildd deployment story etc.

Welcome to BerliOS projects

Monday, October 10th, 2011

It’s sad to read that BerliOS will close in December, after nearly twelve years of serving open source projects. One fewer project hosting site means that the open source world is that bit poorer.

If you’ve been hosting your project on the BerliOS Developer platform and you’re looking for a new home, you’ve got plenty of choice.

We’d love to welcome you to Launchpad and here are a few reasons why you should consider Launchpad:

If you have questions, you’re very welcome to join us in #launchpad on FreeNode and the launchpad-users mailing list.