Author Archive

New fastdowntime schedule

Tuesday, August 14th, 2012

For the last year, Launchpad has been doing schema patches using a process we call ‘FDT’, short for Fast Down Time. We have applied 60 such patches, typically taking between 60 and 90 seconds each time, at 1000UTC, our scheduled daily 5 minute downtime window for DB patching.

Recently, we eliminated Slony from our environment, which has dropped the overhead of schema patches to ~6 seconds, and this gives us <10 second downtimes to apply schema patches. We’re taking advantage of this to add two new downtime windows at 0200 UTC and 1800 UTC. All three windows will be for 10 seconds. Hopefully you will never notice that we’re doing schema patches. But if Launchpad is offline for a few seconds at one of these times, you’ll know why – we’re busy rolling out a schema change to bring a new feature to life.

Less mail for mailing list admins

Wednesday, July 27th, 2011

Many mailing lists in Launchpad are open teams – that is, anyone is welcome to join, or leave, as they choose.

Until today, every time that happened all the list admins were mailed when someone joined or left their team, even though there is no action to take : in an open team, you cannot kick someone out.

We’ve fixed this – now for open teams (and only open teams) when someone joins or leaves the team, the team admins will not be notified.

In future we will have a subscription facility for team admins that do want these emails, and at that point we will make them optional for all team types.

No more monthly 90 minute downtime

Tuesday, July 26th, 2011

I’m thrilled to be writing this blog post just over a year after starting as Launchpad’s technical architect. During that year we have been steadily improving our ability to deploy changes to Launchpad without causing downtime (of any or all services). Our ability to do this directly impacts our ability to deliver bug fixes and new functionality – our users are very sensitive to downtime.

There has been one particularly tricky holdout though – our monthly 90 minute downtime window where we apply schema changes, do DB server maintenance and so forth.

Starting very soon we will instead have very short windows – approximately 60 seconds long – where we perform schema changes, database server failover (in order to permit DB maintenance on the master server) and so forth.

We expect to do these about 6 times a month based on our historical rate of schema patches, and we are – for now – planning on doing these at 0800 UTC consistently.

This will deliver much less total downtime – 6 minutes a month rather than 90 – at the cost of more frequent interruptions.

If you have API scripts running against Launchpad, you may want to build in a retry mechanism to deal with up to a few minutes of downtime.

We cannot remove downtime entirely for purely technical reasons: Our primary database (postgresql) blocks new readers (or writers) when a schema change is being executed, and the schema change blocks on existing readers (or writers) to complete – it needs an exclusive lock on each relation being altered.

What we can do is automate the process of disconnecting and interrupting existing database connections to let the schema change execute rapidly, and make our schema changes as minimal as possible. Previously, we shut down all the application servers (via a script, but shutting down gracefully takes time), and then ran schema changes which did data migration and so forth. In this new process we will leave the appservers running and just interrupt their connections for the time it take to apply the schema change. That, combined with moving data migration to a background job rather than doing it during the schema change, gives us the short downtimes we’re about to start doing.

More information is available in the LEP and my mailing list post about the project starting.

team owner no longer implies team member

Tuesday, May 31st, 2011

A short headsup about an upcoming change.

A very long time ago the team owner was always a team member. This was changed to make team owners optionally members (sometime before 2008!). However the change was incomplete – there has been an inconsistency in the codebase ever since. For the details see bug 227494.

I wanted to let everyone know about us actually finishing this change though, because for a small number of teams (about 400) their administrators may be surprised when they cannot do things.

The inconsistency was this: if a team owner leaves the team, so they just own it, then they are not listed as a team member. But if they try to exercise a privilege the team grants – e.g. if the team is a bug supervisor – the team owners were able to do this. This setup made it impossible for users to accurately determine who can carry out the responsibilities of a team : the Launchpad web UI incorrectly reported team members.

The fix which will be deployed in the next day or so corrects this inconsistency: Team ownership will no longer grant access to anything that team membership grants.

For clarity, these are the rules around team owners:

  1. When a team owner is assigned (or a team made) the owner defaults to being an administrator-member.
  2. If a team owner deactivates their team membership then they are not considered a team member anymore: resources and access that team membership grants will not be available to the owner at this point.
  3. Team owners can always perform adminstrative tasks on the team: creating new administrators, edit the team description, rename the team etc.
  4. Point 3 allows an owner to add themself to the team they own even if they deactivated their membership previously.

Bug search no longer does substring matching of source package names

Wednesday, February 16th, 2011

As part of improving performance we have disabled the substring matching of source package names. This fixes bug 268508 and bug 607960. However its a slightly contentious issue – opinions vary about whether bug 268508 is a valid bug or not.

So we have only disabled it – the code is still present and when we have more leeway on the performance of bug searching we’ll revisit this and look into some design and UI analysis to decide whether substring matching of this sort should be done or not.

For now though, there should be less timeouts in bug searches.

Should bug search match target names?

Monday, February 7th, 2011

We have a small quandry on the Launchpad development team at the moment. As bug 268508 discusses, when one searches for a bug on Launchpad we do a substring search on the names of bug targets.

For instance, searching in Ubuntu for ‘gcc’ will return all bugs on the packages ‘gcc’, ‘gcc-4.4’, ‘gcc-4.3’, ‘gcc-3.3’ and so forth. Likewise search for bugs in a project group will do a similar substring search on each of the individual projects in the project group.

It turns out that doing this search is itself expensive. I asked on the Ubuntu devel list about turning it off. We would close bug 268508 and also significantly improve search performance.

However this is a possibly contentious change – there was one mail strongly in favour of the current behaviour – so I’d like to get this change proposed to a wider community.

If you’ve got a strong opinion – that the current behaviour is good, or like bug  268508 describes, that its a poor behaviour and we would be better off without it, then I’d love to hear from you. Just leave a comment on this post, drop me an email – robert at canonical.com – or post to the launchpad-users mailing list.

Thanks,
Rob (LP technical architect)

Launchpad edge site deprecated

Wednesday, November 24th, 2010

I previously posted about our continuous deployment efforts in Launchpad. Since then the project has come a long way. We can deploy to nearly all our services without downtime. The remaining services are a bit trickier – but we are working on them.

As part of the project we are consolidating the ‘edge’ domain – https://edge.launchpad.net/, https://bugs.edge.launchpad.net/ and other similar domains – into the main launchpad UI. These domains are now deprecated.

The most important thing this means for you is that for members of our beta test program, we will no longer redirect you to https://edge.launchpad.net/ – instead we are serving our beta UI directly from the main website. The edge site is now running exactly the same code as the main Launchpad cluster and is updated at exactly the same time.

We have done this to deliver new features to our users more efficiently and at the same time simplify our production environment. So far the project has been very successful from our perspective – as I write this we have 5 days of inventory – code we’ve written but not deployed. This is down from an average of 2 weeks prior to this initiative starting, and we often sit lower – 1 to 2 days worth.

In the coming months as we refine this process and project we want to remove the edge cluster. As part of this we will start redirecting browser requests to ‘edge’ domains to the main Launchpad domain.

API clients cannot be redirected in this way, so we also ask that anyone writing or using Launchpad API scripts update them to use the primary cluster. We will slowly decrease the cluster size and disable it completely once we see no traffic on it. The main cluster is currently 3 times the size and should perform better for nearly any API script. To do this, use LPNET_SERVICE_ROOT rather than EDGE_SERVICE_ROOT. To get the LPNET_SERVICE_ROOT symbol, import it from launchpadlib.uris:

from launchpadlib.uris import LPNET_SERVICE_ROOT

If you have any questions about any of this we’d be delighted to hear from you – here, on IRC or the launchpad-user mailing list.

Rob Collins
Technical Architect

Continuous deployment in Launchpad

Tuesday, October 5th, 2010

It currently takes an average of two weeks for new changes that have been developed for Launchpad to become live on the Launchpad site.

We’re working on changing this and making the way we deploy Launchpad simpler and more reliable at the same time.

In the first generation of this, we are targeting changes that do not alter the data model, and we’re aiming for a delay of 12-16 hours. Longer term we’ll be aiming for a few hours.

If you are a ‘beta‘ user of Launchpad, this has one primary, and very important change: the ‘edge’ site is going to be removed. We now have a process for validating changes that would previously have been validated on edge using a new staging site. The edge site previously received unvalidated updates and would from time to time have issues as a result. If you are not a ‘beta’ user, then nothing should change for you at all, except that you will notice site changes more often, with no downtime, rather than once a month after downtime.

Sometime in the next few weeks the redirect to ‘edge’ will be removed (it only affected beta users). Instead of a redirect to ‘edge’, the main website will offer you any unreleased functionality, and the ‘disable edge redirect’ link will turn off that functionality for a moderate time period. Following that we will put in place a redirect from ‘edge’ to the normal ‘launchpad.net’ across all of the ‘edge’ servers, and move the servers to the main site server farm.