Wednesday, April 30, 2014

Usability in action

At the University of Minnesota, we are working on an upgrade to our Enterprise systems. I was glad to see this recent item about the "Portal" team applying usability practices to the upgrade: Portal Team Holds Usability Evaluations on Pages for Faculty, Staff and Students.

You really can learn a lot just by sitting down with a few users, and watching them use the software. And the Portal team did learn a lot. From the article:
There are often insights uncovered in usability testing that designers wouldn’t think of, especially when tailoring each experience to different audiences. When one faculty member took a look at the list of courses she taught, something was off. The list was sorted by title, but “faculty look at the course designator, not the title,” she explained. An easy fix, but one the designer might not have noticed. Getting these small details right can mean the difference between a good experience and a frustrating one.
I'm really glad to see usability in action during this important upgrade!

Monday, April 7, 2014

Comparing my predictions

Before starting the usability test, I made some predictions for how things would go, what themes I would uncover, and how users would respond to the tasks. Here is a quick comparison of my predictions to what actually happened:

The 4 (now 5) themes will continue
I definitely saw a continuation of the 5 themes: Familiarity, Consistency, Menus, Obviousness, and Unexpected Behavior. The "hot corner" remained an issue for several testers, although this wasn't as prevalent as my previous usability test. Fortunately, when participants experienced the hot corner problem, most were able to quickly get back to the program before I stepped in to do it for them. But in general, this unexpected behavior was confusing and startling. 
Consistency was an important theme. Testers frequently said that because all programs looked more or less the same, they were able to apply what they learned in one program to the other programs. But a challenge was Familiarity; GNOME just isn't very much like Mac or Windows, so everyone had to re-learn how to use GNOME. 
Obviousness was also important, and in fact some testers thought that they hadn't completed an action because GNOME didn't give a very obvious indication that their action had an effect. A specific example is find and replace in Gedit. Menus were also an issue, but I'll talk more about that next.
Users will be confused about menus
While testers were confused about having menus split across two different "areas" (the Gear menu and the Application menu) they weren't as confused as I thought they would be. Several participants shared comments that the Gear menu (which they called "Options" or some similar name) seemed to have "program actions" and the Application menu had more "top level" actions. But overall, I'd say that users were confused about split menus.
Users will think the test didn't go well, even if they were able to complete all the tasks
I thought this would be the case, and I made a big deal out of it in my predictions, citing rhetoric principles. But it turns out that users thought the overall test went well if the last program went well. (This is an example of primacy-recency, a different principle of rhetoric.)
Specific problems with applications and tasks
I said this would be exhibited by specific problems with applications and tasks, such as: changing the default font in Gedit and Notes, creating a shortcut to a folder in Nautilus, searching for a file that's not under the current folder tree, bookmarking a website in Web, increasing the font size in Web, and deleting all notes in Notes. And for the most part, I was right. Look at the heat map to see how testers fared in these tasks. 
"Hot" areas include: changing the default font in Gedit and Notes (as predicted) and creating a shortcut to a folder in Nautilus (as predicted). Other less critical (but still important) usability issues include: find and replace in Gedit, bookmarking a website in Web (as predicted), increasing the font size in Web, and installing programs in Software. 
While I thought users would have a problem with Selection Mode in Notes, users didn't experience problems here. But importantly, neither did they use Selection Mode. When asked to delete all notes, testers invariably right-clicked a note, then clicked the Delete button - then repeated the process for the second note. Testers didn't seem to see a need to use Selection Mode to delete multiple notes.
Problems caused by possible bugs
Fortunately, I didn't experience any unexpected bugs. One problem I knew about beforehand is that if you install a program (Robots) using Software, then immediately try to remove the same program, you get an error. You seem to have to exit the Software program after installing a program, then re-launch Software before you can remove it. That seems like a program bug. As a workaround, I asked testers to exit the Software program before they tried to delete the Robots game, so we avoided that bug in Software.

Outline of my paper

This has been a very busy week, doing the usability tests and the analysis to identify trends. I'm using a heat map like last time, which I think shows the trends more clearly. This heat map also shows the same tests from the previous usability test, so I can start a comparative analysis.

I've also shared this with the folks from GNOME. We're going to try for a Hangout in the next week or so to go over the results and talk in more detail, discuss themes, and generally about how things went.

From here, my next steps are too finish up the next draft of my paper. I think the next draft will be much closer to the final form (which is good, based on where we are in the semester). The previous draft reused some text from my blog, so had a very informal tone. The next draft will step that up.

Since this is for my M.S. capstone, this iteration will be a much longer and more detailed document than the paper I wrote for a previous class on usability (later turned into an article for Linux Journal). My outline is currently shaping up this way:

  1. Introduction
  2. What is usability
  3. Usability needs to be an integral part of the development process
    1. context
    2. development methods in closed software v open source software
  4. Why is usability overlooked in open source software
  5. My usability test
  6. Usability test results
  7. Themes and conclusions

Venues?

After I have something more concrete to share, I'd love to get my results out there somehow. Academic journals are a good place to share. And I'm already talking to Linux Journal about a followup article, from the December 2013 issue. I think I'll also submit it to Slashdot and Reddit. I wonder if FSF would be interested in something specific to GNOME, for their newsletter? Maybe if I write a version that is more general. I'd love to write a version of my article for OpenHatch, and other organizations that are interested in open source software.

I'm also starting to think about conferences where I could talk more about this. I missed LibrePlanet, but maybe next year. Other conferences to consider are Penguicon and GUADEC.

What other venues would you suggest?

Friday, April 4, 2014

The usability of GNOME 3.10 & 3.12

This is only a preview of my results, just to share some immediate analysis. I'll provide more details later.

I scheduled 15 people for the usability test of GNOME 3.10 & 3.12. Three people were unable to make it, so I had 12 test participants. And from Nielsen, we know you only need about five testers to get useful results. After about five testers, you have enough information about themes and trends. And I definitely saw that here; after the first five or six testers, the trends were clear. With 12 participants, that's over twice the minimum required to expect good results.

So, what were the trends in this test? As before, I prefer to use a "heat map" to display the results. During the usability test, I typed up notes on a laptop while I observed the test participants doing the scenario tasks. I captured quotes as the test participants used the "speak aloud" method to describe their process and thoughts, but more generally I noted what paths the testers were attempting to complete the tasks. What menus did they click into? What buttons caught their attention? At the end of each task, I indicated if the tester had completed the task, and how difficult it was for them to do.

In the below heat map, a green box indicates a tester completed a task with little or no difficulty, while a yellow box represents a task that had moderate difficulty. A red box indicates a task that was extremely difficult to do, or was completed incorrectly or by accident. Where the tester was unable to complete the task, or chose to abort the task (usually believing it to be impossible to complete) I use a dark red (maroon) box. A white box indicates a test that was skipped.

This is just a quick screenshot from a spreadsheet:

(click to enlarge)

You may remember that I performed a similar usability test of an earlier release of GNOME, about a year and a half ago. That test involved Gedit, Firefox, and Nautilus - and included seven test participants. I've represented those results to the right, matched against this test of Gedit, Web, and Nautilus. This shows where programs fared better, worse, or about the same between the earlier version (on the right) and versions 3.10 & 3.12 (on the left).

I'll provide an analysis in a future post. Comparing these results to my notes, there's lots to be said about how testers actually use GNOME, and how they expect to use GNOME. But I'll have to save that for a future post.