Better dupe finding
One of my favourite things about Launchpad’s bug tracker is the dupe finder: when you report a new bug, it’ll search to see if there’s already a similar bug report. It’s the same for questions in Launchpad Answers, too.
Getting to see possible dupes before you file a bug or question is a great time saver for you and the people on the other end. However, the dupe finder has been timing out a lot lately.
Rob Collins, Launchpad’s new Technical Architect, has introduced some changes that should make the dupe finder more reliable.
Other than fewer timeouts, here’s what you might notice:
- the dupe finder now returns fewer matches — three or four rather than ten or more
- the results should be more relevant.
We want to know how this works in practice. Let us know how you get on with the new dupe finder. Either leave a comment here, mail feedback@launchpad.net or join us on the launchpad-users mailing list.
How Rob did it
The previous dupe finder had a number of problems, not least that the search engine it’s built on is less efficient than we need. We’re planning to replace the search engine but not straight away, so Rob looked for a temporary solution that would work for the next five or six months.
I’ll hand over to Rob to explain what he actually did:
The old search did a pre-pass over every possible hit, which is 400,000 items for Ubuntu bugs and very slow to do. It then did a search matching any document that had a rare search term in it.
So, by rare we mean that the term showed up in less than half of the possible hits.
For example, if you searched for “firefox crashes on <website> in flash” on /ubuntu/+filebug it would search for any bug with any of “firefox” (< 50% of bugs are on firefox), "crash" (<50% of bugs say "crash"), "<
can switch this off easily if we have to, so we do want feedback about how people find this. 
Tags: front-page



August 13th, 2010 at 3:38 pm
[…] Some bugs get reported more than once. That’s why we’ve got the dupe finder. […]