Sikuli — scripting your use of GUIs

Sikuli logo
The Sikuli project recently switched to using Launchpad. I asked Tsung-Hsiang Chang to tell me more about the project.

Matthew: Your website says, “Sikuli is a visual technology to search and automate graphical user interfaces (GUI) using images (screenshots).” That sounds like it’s a general purpose screen-scraping solution. Is that right?

Tsung-Hsiang: Not exactly. Sikuli is not about extracting data from screen.

The current release of Sikuli is called Sikuli Script, which focuses on only automation using screenshots of GUI widgets.

We have another project called Sikuli Search, which queries a search engine using screenshots instead of keywords.

Although Sikuli Script is supposed to be able to “search” buttons or text on the screen, it isn’t good at scraping or analyzing information from screenshots yet.

Matthew: In your YouTube demo video, you show how you can write a Sikuli script that will open an app by name. Then, you can take a screen shot of one of the icons in that app and have Sikuli click it as part of the script. You even show how you just have to take a screen shot of the option you want from a drop-down list and Sikuli will select that option. That’s really cool but you say that Sikuli will even tolerate small changes in the icons you ask it to click. How does that work?

Tsung-Hsiang: Matching between a target image and the screen image is done by computing the normalized cross-correlation between the two images.

This is a standard technique in computer vision for finding patterns when variations are known to be small. This technique works incredibly well for matching desktop GUI patterns.

Matthew: Where do you think Sikuli will be most useful?

Tsung-Hsiang: Whenever the internal API of an application is not exposed. Lots of people have created their scripts to play the applications that were used to be very difficult to be automated, such as facebook (flash) games and testing android systems.

Matthew: Particularly on Mac OS X and Linux-based systems, where does Sikuli become a better option than standard shell scripting?

Tsung-Hsiang: If a command line interface is available, Sikuli may be not a good choice for a shell scripting guru. But sometimes you just can’t find command line tools or don’t want to learn complicated commands and parameters. In fact, the core of Sikuli is just a Jython library, so it can be mixed with other Python scripts or command line tools easily. Therefore, Sikuli can be an additional handy tool for command line gurus.

Besides, the primary goal of Sikuli is to help ordinary users who know nothing about command line tools and shell scripting to automate their tasks. We hope everyone can enjoy using computer efficiently.

Matthew: What’s next for Sikuli?

Tsung-Hsiang:Better, faster, more accurate.

We have a long list of planned improvements for Sikuli. Among the top of list are:

  1. Social programming: the ability to share scripts and search scripts by visual patterns. For example, when a user takes a screen shot of a recycle bin, the user can search a database for all the other scripts written by other users that involve the image of a recycle bin.
  2. Event-driven programming: the ability to register event handlers to handle visual events. For example, a user can define a function to pop up a warning message to handle the visual event that involves the appearance of the “low battery image”.
  3. Face detection: the ability to find faces on the screen.
  4. Recorder: the ability to record a sequence of clicking and typing operations and generate a visual script automatically.
  5. Tutorial converter: the ability to convert an existing step-by-step instruction with screenshots into executable scripts.

Matthew: Why did you choose Launchpad?

Tsung-Hsiang:We were using Trac before moving to Launchpad. Trac is more developer-oriented, just like Github.

But what we really want is a user-oriented project hosting site. We want a place to report and discuss bugs, ask and answer questions, and also download and track the development of Sikuli.

We compared Github and Launchpad, and at last chose Launchpad over Github because Launchpad has almost everything we need, except a wiki for writing documents. But we already had a wiki in our Trac system, so this was not a big problem.

Matthew: Do you have any requests for the Launchpad community?

Tsung-Hsiang: It would be great if we can write documents on Launchpad. A wiki or something like EtherPad would be fantastic.

Matthew: Thanks very much and good luck with Sikuli!

Leave a Reply