Monday, November 12, 2018

Usability Testing in Open Source Software (SeaGL)

I recently attended the 2018 Seattle GNU/Linux Conference, where I gave a presentation about usability testing in open source software. I promised to share my presentation deck. Here are my slides in PNG format, with notes added:
Also, my entire presentation is under the CC-BY:
I've been involved in Free/open source software since 1993, but recently I developed an interest in usability testing in open source software. During a usability testing class in my Master's program in Scientific and Technical Communication (MS) I studied the usability of GNOME and Firefox. Later, I did a deeper examination of the usability of open source software, focusing on GNOME, as part of my Master's capstone. (“Usability Themes in Open Source Software,” 2014.)

Since then, I've joined the GNOME Design Team where I help with usability testing.

I also (sometimes) teach usability at the University of Minnesota. (CSCI 4609 Processes, Programming, and Languages: Usability of Open Source Software.)
I’ve worked with others on usability testing since then. I have mentored in Outreachy, formerly the Outreach Program for Women. Sanskriti, Gina, Renata, Ciarrai, Diana were all interns in usability testing. Allan and Jakub from the GNOME Design Team co-mentored as advisers.
What do we mean when we talk about “usability”? You can find some formal definitions of usability that talk about the Learnability, Efficiency, Memorability, Errors, and Satisfaction. But I find it helps to have a “walking around” definition of usability.

A great way to summarize usability is to remember that real people are busy people, and they just need to get their stuff done. So a program will have good usability if real people can do real tasks in a realistic amount of time.

User eXperience (UX) is technically not the same as usability. Where usability is about real people doing real tasks in a reasonable amount of time, UX is more about the emotional connection or emotional response the user has when using the software.

You can test usability in different ways. I find the formal usability test and prototype test work well. You can also indirectly examine usability, such as using an expert to do a heuristic evaluation, or using questionnaires. But really, nothing can replace watching a real person trying to use your software; you will learn a lot just by observing others.
People think it's hard to do usability testing, but it's actually easy to do a usability test on your own. You don’t need a fancy usability lab or any professional experience. You just need to want to make your program easier for other people to use.

If you’re starting from scratch, you really have three steps to do a formal usability test:

1. Consider who are your users. Write this down as a short paragraph for each kind of user for your software. Make it a realistic fiction. These are your Personas. With personas, you can make design decisions that always benefit the user. “If we change __ then that will make it easier for users like Jane.” “If we add __ then that will help people like Steve.”

2. For each persona, write a brief statement about why that user might use the software to do their tasks. There are different ways that a user might use the software, but just jot down one way. This is a Use Scenario. With scenarios, you can better understand the circumstances when people use the software.

3. Now take a step back and think about the personas and scenarios. Write down some realistic tasks that real people would do with the software. Make each one stand on its own. These are scenario tasks, and they make up your actual usability test. Where you should write personas and scenarios in third-person (“__ does this..”) you should write scenario tasks in second-person (“you do this..”) Each scenario task should set up a brief context, then ask the tester to do something specific. For example:

You don’t have your glasses with you, so it’s hard to see the text on the screen. Make the text bigger so you can read it more easily.

The challenge in scenario tasks is not to accidentally give hints for what the tester should do. Avoid using the same words and phrases from menus. Don’t be too exact about what the tester should do - instead, describe the goal, and let the tester find their own path. Remember that there may be more than one way to do something.

The key in doing a usability test is to make it iterative. Do a usability test, analyze your results, then make changes to the design based on what you learned in the test. Then do another test. But how many testers do you need?
You don’t need many testers to do a usability test if you do it iteratively. Doing a usability test with five testers is enough to learn about the usability problems and make tweaks to the interface. At five testers, you’ve uncovered more than 80% of usability problems, assuming most testers can uncover about 31% of issues (typical).

But you may need more testers for other kinds of usability tests. “Only five” works well for traditional/formal usability tests. For a prototype test, you might need more testers.

But five is enough for most tests.
If every tester can uncover about 31% of usability problems, then note what happens when you have one, five, and ten testers in a usability test. You can cover 31% with one tester. With more testers, you have overlap in some areas, but you cover more ground with each tester. At five testers, that’s pretty good coverage. At ten testers, you don’t have considerably better coverage, just more overlap.

I made this sample graphic to demonstrate. The single red square covers 31% of the grey square's area (in the same way a tester can usually uncover about 31% of the usability problems, if you've designed your test well). Compare five and ten testers. You don't get significantly more coverage at ten testers than at five testers. You get some extra coverage, and more overlap, but that's a lot of extra effort for not a lot of extra value. Five is really all you need.
Let me show you a usability test that I did. Actually, I did two of them. This was part of my work on my Master’s degree. My capstone was Usability Themes in Open Source Software. Hall, James. (2014). Usability Themes in Open Source Software. University of Minnesota.

I wrote up the results for each test as separate articles for Linux Journal: “The Usability of GNOME” (December, 2014) and “It’s about the user: Usability in open source software” (December, 2013).
I like to show results in a “heat map.” A heat map is just a convenient way to show test results. Scenario tasks are in rows and each tester is a separate column.

For each cell (a tester doing a task) I use a color to show how easy or how difficult that task was for the tester. I use this scale:

Green if the tester easily completed the task. For example, if the tester seemed to know exactly what to do, what menu item to activate or which icon to click, you would code the task in green for that tester.

Yellow if the tester experienced some (but not too much) difficulty in the task.

Orange if the tester had some trouble in the task. For example, if the tester had to poke around the menus for a while to find the right option, or had to hunt through toolbars and selection lists to locate the appropriate icon, you would code the task in orange for that tester.

Red if the tester experienced severe difficulty in completing the task.

Black if the tester was unable to figure out how to complete the task, and gave up.

There are some “hot” rows here, which show tasks that were difficult for testers: setting the font and colors in gedit, and setting a bookmark in Nautilus. Also searching for a file in Nautilus was a bit challenging, too. So my test recommended that the GNOME Design Team focus on these four to make them easier to do.
This next one is the heat map from my capstone project.

Note that I tried to do a lot here. You need to be realistic in your time. Try for about an hour (that’s what I did) but make sure your testers have enough time. The gray “o” in each cell is where we didn’t have enough time do that task.

You can see some “hot rows” here too: setting the font in gedit, and renaming a folder in Nautilus. And changing all instances of some words in gedit, and installing a program in Software, and maybe creating two notes in Notes.
Most of the interns did a traditional usability test. So that’s what Sanskriti did here:
Sanskriti did a usability test that was similar to mine, so we could measure changes. She had a slightly different color map here, using two tones for green. But you can see a few hot rows: changing the default colors in gedit, adding photos to an album in Photos, and setting a photo as a desktop wallpaper from Photos. Also some warm rows in creating notes in Notes, and creating a new album in Photos.
Gina was from my second cycle in Outreachy, and she did another traditional usability test:
You can see some hot rows in Gina's test: bookmarking a location in Nautilus, adding a special character (checkmark) using Characters and Evince, and saving the location (bookmark) in Evince. Also some warm rows: changing years in Calendar, and saving changes in Evince. Maybe searching for a file in Nautilus.

Gina did such great work that we co-authored an article in FOSS Force: "A Usability Study of GNOME" (March, 2016).
In the next cycle of Outreachy, we had three interns: Gina, Ciarrai and Diana. Renata did a traditional usability test:
In Renata’s heat map, you can see some hot rows: creating an album in Photos, adding a new calendar in Calendar, and connecting to an online account in Calendar. And maybe deleting a photo in Photos and setting a photo as a wallpaper image in Photos. Some issues in searching for a date in Calendar, and creating an event in Calendar.

See also our article in Linux Voice Magazine: "GNOME Usability Testing" (November, 2016, Issue 32).
Ciarrai did a prototype test for a future design change to the GNOME Settings application:
In the future Settings, the Design Team thought they’d have a list of categories down the side. Clicking on a category shows you the settings for that category. Here’s a mock-up for Wi-Fi in the new Settings. You can see the list of other categories down the left side:
Remember the “only five” slide from a while back? That’s only for traditional/formal usability tests. For a prototype test, we didn’t think five was enough, so Ciarrai did ten testers.

For Ciarrai’s heat map, we used slightly different colors because the tester wasn’t actually using the software. They were pointing to a paper printout. Here, green indicates the tester knew exactly what to point to, and red indicates they pointed to the wrong one. Or for some tasks that had sub-panels, orange indicates they got to the first panel, and failed to get to the second setting.

You can see some hot rows, indicating where people didn’t know what category would have the Settings option they were looking for: Monitor colors, and Screen lock time. Also Time zone, Default email client, and maybe Bluetooth and Mute notifications.
Other open source projects have adopted the same usability test methods to examine usability. Debian did a usability test of GNOME. Here’s their test: (*original)
They had more general “goals” for testers, called “missions.” Similar to scenario tasks, the missions had a more broad goal that provided some flexibility for the tester. But not very different from scenario tasks.

You can see some hot rows here: temporary files and change default video program in Settings, and installing/removing packages in Package Management. Also some issues in creating a bookmark in Nautilus, and adding/removing other clocks in Settings.
If you want more information, please visit my blog or email me.
I hope this helps you to do usability testing on your own programs. Usability is not hard! Anyone can do it!

No comments:

Post a Comment