Archive for the ‘General’ Category

New approaches to new bug listings

Thursday, December 15th, 2011

The new bug listings listings were the first time my squad, the Orange Squad, had a chance to work on some really nice in-depth client-side UI since our squad was formed. Not only were we implementing the feature, we wanted to lay groundwork for future features.  Here are some of the new things we’ve done.

Synchronized client-side and server-side rendering

Early on, we decided to try out the Mustache template language, because it has client and server implementations. Although we wanted to make a really responsive client-side UI, we also wanted to have server-side rendering, so that we’re not broken for web crawlers and those with JavaScript disabled. Being able to use the same template on the server and the client seemed ideal, since it would ensure identical rendering, regardless what environment the rendering was done in.

It’s been a mixed bag. We did accomplish the goal of using a single template across client and server, but there are significant bugs on both sides.

The JavaScript implementation, mustache.js, is slow on Firefox. Rendering 75 rows of data takes a noticeable length of time. If you’re a member of our beta team, you can see what I mean. Go to the bugs page for Launchpad itself in Firefox. Click Next. Now click Previous. This will load the data from cache, but it still takes a visible length of time before the listings are updated (and the Previous link goes grey).

mustache.js also has bugs that cause it to eat newlines and whitespace, but those can be worked around by using the appropriate entity references, e.g. replacing “\n” with “
”

The Python implementation, Pystache, does not implement scoping correctly. It is supposed to be possible to access outer variables from within a loop. On the client, we use this to control the visibility of fields while looping over rows. On the server, we have to inject these control variables into every row in order for them to be in scope.

We needed a way to load new batches of data. Mustache can use JSON data as its input. Launchpad’s web pages have long had the ability to provide JSON data to JavaScript, but Brad Crittenden and I recently added support for retrieving the same data without the page, via a special ++model++ URL. This seemed like the perfect fit to me, and it’s turned out pretty well. Using the ++model++ URL rather than a the Launchpad web service means the server-side rendering can tightly parallel the client-side rendering.  Each uses the same data to render the same template.  It also means we don’t have to develop a new API, which would probably be too page-specific.

Client-side Feature Flags

While in development, the feature was hidden behind a Feature Flag. But at one point, we found we wanted access to feature flags on the client side, so we’ve now implemented that.

History as Model

We wanted users to be able to use their browser’s Next and Back buttons in a sensible way, especially if they wanted to return to previous settings. We also wanted all our widgets to have a consistent understanding of the page state.

We were able to address both of these desires by using YUI’s History object as a common model object for all widgets.  History provides a key/value mapping, so it can be used as a model.  That mapping gets updated when the user clicks their browser next and back buttons.  And when we update History programmatically, we can update the URL to reflect the page state, so that the user can bookmark the page (or reload it) and get the same results.  Any update to History, whether from the user or from code, causes an event to be fired.

We’re able to boil the page state down to a batch identifier and a list of which fields are currently visible. The actual batches are stored elsewhere, because History isn’t a great place to store large amounts of data.  For one thing, there are limits on the amount of data that can be stored.  For another, the implementation that works with old browsers, HistoryHash, can’t store anything more complex than a string as a value.

All our widgets then listen for events indicating History has changed, and update themselves according to the new values in the model.

Summing up

It’s been an interesting feature to work on, both because of the new techniques we’ve been able to try out, and because we’ve been closely involved with the Product team, trying to bring their designs to life.  We haven’t quite finalized it yet, but I’m going on leave today, so I wanted to let you know what we’ve all been up to.  Happy holidays!

Legacy, performance, testing: 6 months of new critical bugs analyzed

Friday, December 9th, 2011

Bugs

The Launchpad maintenance teams have been working since the beginning of the year at reducing our Critical bugs count to 0. Without much success this far. The long term trend keeps the backlog at around 300.  And it’s not because we haven’t been fixing these. Since the beginning of the year, more than 800 Critical bugs were fixed, but more than 900 were reported 🙁

So I investigated what was the source of all these new critical bugs we were finding. A random sample of 50 critical bugs filed were analyzed to see where and why they were introduced. The full analysis is available as a published Google document.

Here are the most interesting findings from the report:

  • Most of the new bugs (68%) are actually legacy issues lurking in our code base.
  • Performance and spotty test coverage represents together more than 50% of the cause of our new bugs. We should refocus maintenance on tackling performance problems, that’s what is going to bring us the most bang for the bucks (even though it’s not cheap).
  • As a team, we should increase our awareness of testing techniques and testing coverage. Always do TDD, maybe investigate ATDD to increase the coverage and documentation our the business rules we should be supporting.
  • We also need to pay more attention to how code is deployed, it’s now very usual for scripts to be interrupted, and for the new and ancient version of the code to operate in parallel.

Another way of looking at this is that Launchpad represents a very deep mine of technical debt. We don’t know how exactly deep the mine is, but we are going to find existing Critical issues until we hit the bottom of the mine. (Or until we finish refactoring enough of Launchpad for better testing and performance. That’s what the SOA project is all about.)

In the mean time, we should pay attention to regressions and fallouts, (those are really the new criticals) to  make sure that we aren’t extending the mine!

Photo by Brian W. Tobin. Licence: CC BY-NC-ND 2.0.

Custom bug listings – have your say

Tuesday, December 6th, 2011

Our custom bug listings beta has been up and running for just over a week now – thanks to everyone in the Launchpad beta testers group that have tried it out, and thank you for all your valuable feedback and comments. If you haven’t tried it yet, you can get access by joining our beta team here:  https://launchpad.net/~launchpad-beta-testers

We want to improve how the default information is displayed to make this tool work better, so we’ve put together a super-quick survey to find out:

– What information about a bug you most want to see in bug listings

– What the default ‘order by’ options should be

– If you’d like to see any other ‘order by’ options.

These three questions should only take a few minutes to complete, but they’ll add real value to our work redesigning how bug listings appear and function. Here’s the link if you’d like to take part

Improved performance for personal code pages

Thursday, November 10th, 2011

Edit 2011-11-15 08:18 UTC: The problem is now fixed and we’ve re-enabled the new menu.

Edit 2011-11-11 13:42 UTC: We’ve temporarily disabled the new menu while we fix some unfortunate side effect.

We’ve just deployed a new, simplified version of the branch menu displayed on the right hand side of personal code pages (e.g. personal page for the Launchpad team). It looks like this:

Old menu

New menu

Calculating the number of branches took way too much time for people/teams with a huge number of branches (e.g. https://code.launchpad.net/~ubuntu-branches), up to the point that they were getting timeouts.

The new design, along with optimisations we’ve made to the database queries, should improve performance for everyone.

Daily builds of huge trees

Thursday, November 10th, 2011

We’ve just upgraded Launchpad’s builder machines to Bazaar 2.4. Most importantly, this means that recipe builds of very large trees will work reliably, such as the daily builds of the Linaro ARM-optimized gcc. (This was bug 746822 in Launchpad).

We are going to do some further rollouts over the next week to improve supportability of recipe builds, support building non-native packages, handle muiltiarch package dependencies, improve the buildd deployment story etc.

Welcome to BerliOS projects

Monday, October 10th, 2011

It’s sad to read that BerliOS will close in December, after nearly twelve years of serving open source projects. One fewer project hosting site means that the open source world is that bit poorer.

If you’ve been hosting your project on the BerliOS Developer platform and you’re looking for a new home, you’ve got plenty of choice.

We’d love to welcome you to Launchpad and here are a few reasons why you should consider Launchpad:

If you have questions, you’re very welcome to join us in #launchpad on FreeNode and the launchpad-users mailing list.

Finding bugs that affect you

Tuesday, October 4th, 2011

We’ve recently deployed two features that make it easier to find bugs that you’re previously said affect you:

1: On your personal bugs page, there’s now an Affecting bugs that shows all these bugs.

2: On a project, distribution or source package bug listing page, there’s now a “Bugs affecting me” filter on the right (for example, bugs affecting you in the Launchpad product).

Counts of the number of affected users already help developers know which bugs are most urgent to fix, both directly and by feeding into Launchpad’s bug heat heuristic. With these changes, the “affects me” feature will also make it easier for you to keep an eye on these bugs, without having to subscribe to all mail from them.

screenshot of "This bug affects me" control

Launchpad now accepts mail commands from gmail

Saturday, October 1st, 2011

If you use gmail, you should now be able to send commands to Launchpad without gpg-signing.

gmail puts a DKIM cryptographic signature on outgoing mail, which is a cryptographic signature that proves that the mail was sent by gmail and that it was sent by the purported user. We verify the signature on Launchpad and treat that mail as trusted which means, for example, that you can triage bugs over mail or vote on merge proposals. Previously you needed to GPG-sign the mail which is a bit of a hassle for gmail.

(DKIM is signed by the sending domain, not by the user, so it doesn’t inherently prove that the purported sender is the actual one. People could intentionally or unintentionally set up a server that allows intra-domain impersonation, and it’s reported to be easy to misconfigure DKIM signers so that this happens. (Consider a simple SMTP server that accepts, signs and forwards everything from 192.168/16 with no authentication.) However, in cases like gmail we can reasonably assume Google don’t allow one user to impersonate another. We can add other trusted domains on request.)

If you have gmail configured to use some other address as your From address it will still work, as long as you verify both your gmail address and your other address.

You can use email commands to interact with both bugs and code merge proposals. For instance when Launchpad sends you mail about a new bug, you can just reply

  status confirmed
  importance medium

Thanks for letting us know!

We do this using the pydkim library.

Note that you do need at least one leading space before the commands.

If you hit any bugs, let us know.

Deployment reports are now public

Friday, September 30th, 2011

Steve Kowalik writes:

For a while now Launchpad has been using deployment reports that tell us what state our QA is in, which revisions are safe to deploy to production, or which revisions are not safe to deploy since they failed QA.

Some time ago, I started the process to make these reports public, and I’m proud to announce that today, they are!

If you’re waiting to see when your code in Launchpad is likely to go live, take a look at the new public deployment report.

Speeding up development

Friday, September 9th, 2011

SpeedometerToday we reached a significant milestone, we completed our first fast down-time deployment. Two obvious reasons for doing this were already mentioned in the announcement and our technical architect”s post describing the change:

  1. We’ll have less downtime per month (at the cost of more frequent but short interruption).
  2. We’ll be able to deploy fixes and changes involving DB schema more frequently.

But from my perspective, the most important benefit I think we’ll get from this is a speed up in our rate of development, particularly, in terms of completing feature projects. It’s not a secret, our feature squads spends a lot of time to complete their projects. There are multiple reasons for this, but in the end, there usually fall under two broad cateries: the time it takes to actually make the change, and the delays in getting feedback on the change itself.

To help with the first category, you’ll want better and more powerful libraries, better architecture, developers’ training, etc. Think the time difference between developping a database web application in Django vs as a CGI C application using only the standard C library. Launchpad isn’t using the most modern libraries and toolkits, and we could still make a lot of improvements there.  But the costs of making changes in this space are compounded by the problems of the other category.

Once you wrote the change, you are far from done. There are lots of hoops you still have to jump through before saying “done-done“: you’ll need to make sure the tests pass, to get your changes reviewed, merged, QAed and then deployed. And finally, you’ll probably want to make sure that it matches the user’s expectations, but until it’s in production, this is hard to assess reliably. All of these steps takes time and introduce delays, some bigger than others. The Launchpad team is always on the look-out to cut in these delays and the new “fast down-time deployments” cut on of the biggest one we had.

Cycle time distribution

Since a picture is worth at least a thousand words, have a look at the chart above to have a better idea of what I’m talking about. It shows the distribution of the time it takes to complete a “change”. (What this plots is the cycle time from coding to deployment of our Kanban cards which roughly map to one logical change.) You’ll see that 50% of our changes are deployed to production in about a week. And the next 45% takes between 1 and 5 weeks. Now, our feature projects are composed of many many of these smaller changes. If those are all  relatively small changes, why do they take so long?

One of the big bottleneck was the batch size of our DB deployment. If a change required a DB schema it waited until the next downtime deployment which happened once every month. In theory, that means that on average a change involving the database would wait 2 weeks in the queue before deployment. In practice, it’s more complex than that, because squad leads would often plan around these. So a database change would be hold off onto because it was deemed that it couldn’t be safely completely to be part of the next downtime deployment. So it might be put on hold in favour of other work, and delayed to the next downtime deployment. It’s also frequent to have other changes building on the first also queued up waiting for the next deployment window. Add on top of that, that it’s common for the completion of a feature to require several iterations of DB change based on feedback and you quickly understand how you can be working months on a feature project!

But this major bottleneck is now gone! We’ll be able to land and deploy DB changes reliably within days, giving us much more rapid feedback. I’m looking forward to the change in the cycle time distribution in the coming months. The whole distribution should move toward the left. I’ll write a follow-up in two months to see if this prediction comes true.
Photo by Nathan E Photography. Licence: CC BY 2.0.