I scheduled 15 people for the usability test of GNOME 3.10 & 3.12. Three people were unable to make it, so I had 12 test participants. And from Nielsen, we know you only need about five testers to get useful results. After about five testers, you have enough information about themes and trends. And I definitely saw that here; after the first five or six testers, the trends were clear. With 12 participants, that's over twice the minimum required to expect good results.
So, what were the trends in this test? As before, I prefer to use a "heat map" to display the results. During the usability test, I typed up notes on a laptop while I observed the test participants doing the scenario tasks. I captured quotes as the test participants used the "speak aloud" method to describe their process and thoughts, but more generally I noted what paths the testers were attempting to complete the tasks. What menus did they click into? What buttons caught their attention? At the end of each task, I indicated if the tester had completed the task, and how difficult it was for them to do.
In the below heat map, a green box indicates a tester completed a task with little or no difficulty, while a yellow box represents a task that had moderate difficulty. A red box indicates a task that was extremely difficult to do, or was completed incorrectly or by accident. Where the tester was unable to complete the task, or chose to abort the task (usually believing it to be impossible to complete) I use a dark red (maroon) box. A white box indicates a test that was skipped.
This is just a quick screenshot from a spreadsheet:
|(click to enlarge)|
You may remember that I performed a similar usability test of an earlier release of GNOME, about a year and a half ago. That test involved Gedit, Firefox, and Nautilus - and included seven test participants. I've represented those results to the right, matched against this test of Gedit, Web, and Nautilus. This shows where programs fared better, worse, or about the same between the earlier version (on the right) and versions 3.10 & 3.12 (on the left).
I'll provide an analysis in a future post. Comparing these results to my notes, there's lots to be said about how testers actually use GNOME, and how they expect to use GNOME. But I'll have to save that for a future post.