Monday, March 27, 2017

Testing LibreOffice 5.3 Notebookbar

I teach an online CSCI class about usability. The course is "The Usability of Open Source Software" and provides a background on free software and open source software, and uses that as a basis to teach usability. The rest of the class is a pretty standard CSCI usability class. We explore a few interesting cases in open source software as part of our discussion. And using open source software makes it really easy for the students to pick a program to study for their usability test final project.

I structured the class so that we learn about usability in the first half of the semester, then we practice usability in the second half. And now we are just past the halfway point.

Last week, my students worked on a usability test "mini-project." This is a usability test with one tester. By itself, that's not very useful. But the intention is for the students to experience what it's like to moderate their own usability test before they work on their usability test final project. In this way, the one-person usability test is intended to be a "dry run."

For the one-person usability test, every student moderates the same usability test on the same program. We are using LibreOffice 5.3 in Notebookbar View in Contextual Groups mode. (And LibreOffice released version 5.3.1 just before we started the usability test, but fortunately the user interface didn't change, at least in Notebookbar-Contextual Groups.) Students worked together to write scenario tasks for the usability test, and I selected eight of those scenario tasks.

By using the same scenario tasks on the same program, with one tester each, we can combine results to build an overall picture of LibreOffice's usability with the new user interface. Because the test was run by different moderators, this isn't statistically useful if you are writing an academic paper, and it's of questionable value as a qualitative measure. But I thought it would be interesting to share the results.

First, let's look at the scenario tasks. We started with one persona: an undergraduate student at a liberal arts university. Each student in my class contributed two use scenarios for LibreOffice 5.3, and three scenario tasks for each scenario. That gave a wide field of scenario tasks. There was quite a bit of overlap. And there was some variation on quality, with some great scenario tasks and some not-so-great scenario tasks.

I grouped the scenario tasks into themes, and selected eight scenario tasks that suited a "story" of a student working on a paper: a simple lab write-up for an Introduction to Physics class. I did minimal editing of the scenario tasks; I tried to leave them as-is. Most of the scenario tasks were of high quality. I included a few not-great scenario tasks so students could see how the quality of the scenario task can impact the quality of your results. So keep that in mind.

These are the scenario tasks we used. In addition to these tasks, students provided a sample lab report (every tester started with the same document) and a sample image. Every test was run in LibreOffice 5.3 or 5.3.1, which was already set to use Notebookbar View in Contextual Groups mode:
1. You’re writing a lab report for your Introduction to Physics class, but you need to change it to meet your professors formatting requirements. Change your text to use Times New Roman 12 pt. and center your title

2. There is a requirement of double spaced lines in MLA. The paper defaults to single spaced and needs to be adjusted. Change paper to double spaced.

3. After going through the paragraphs, you would like to add your drawn image at the top of your paper. Add the image stored at velocitydiagram.jpg to the top of the paper.

4. Proper header in the Document. Name, class, and date are needed to receive a grade for the week.

5. You've just finished a physics lab and have all of your data written out in a table in your notebook. The data measures the final velocity of a car going down a 1 meter ramp at 5, 10, 15, 20, and 25 degrees. Your professor wants your lab report to consist of a table of this data rather than hand-written notes. There’s a note in the document that says where to add the table.

[task also provided a 2×5 table of sample lab data]

6. You are reviewing your paper one last time before turning it into your professor. You notice some spelling errors which should not be in a professional paper. Correct the multiple spelling errors.

7. You want to save your notes so that you can look back on them when studying for the upcoming test. Save the document.

8. The report is all done! It is time to turn it in. However, the professor won’t accept Word documents and requires a PDF. Export the document as a PDF.
If those don't seem very groundbreaking, remember the point of the usability test "mini-project" was for the students to experience moderating their own usability test. I'd rather they make mistakes here, so they can learn from them before their final project.

Since each usability test was run with one tester, and we all used the same scenario tasks on the same version of LibreOffice, we can collate the results. I prefer to use a heat map to display the results of a usability test. The heat map doesn't replace the prose description of the usability test (what worked v what were the challenges) but the heat map does provide a quick overview that allows focused discussion of the results.

In a heat map, each scenario task is on a separate row, and each tester is in a separate column. At each cell, if the tester was able to complete the task with little or no difficulty, you add a green block. Use yellow for some difficulty, and orange for greater difficulty. If the tester really struggled to complete the task, use a red block. Use black if the task was so difficult the tester was unable to complete the task.

Here's our heat map, based on fourteen students each moderating a one-person usability test (a "dry run" test) using the same scenario tasks for LibreOffice 5.3 or 5.3.1:


A few things about this heat map:

Hot rows show you where to focus

Since scenario tasks are on rows, and testers are on columns, you read a heat map by looking across each row and looking for lots of "hot" items. Look for lots of black, red, or orange. Those are your "hot" rows. And rows that have a lot of green and maybe a little yellow are "cool" rows.

In this heat map, I'm seeing the most "hot" items in setting double space (#2), adding a table (#5) and checking spelling (#6). Maybe there's something in adding a header (#4) but this scenario task wasn't worded very well, so the problems here might be because of the scenario task.

So if I were a LibreOffice developer, and I did this usability test to examine the usability of MUFFIN, I would probably put most of my focus to make it easier to set double space, add tables, and check spelling. I wouldn't worry too much about adding an image, since that's mostly green. Same for saving, and saving as PDF.

The heat map doesn't replace prose description of themes

What's behind the "hot" rows? What were the testers trying to do, when they were working on these tasks? The heat map doesn't tell you that. The heat map isn't a replacement for prose text. Most usability results need to include a section about "What worked well" and "What needs improvement." The heat map doesn't replace that prose section. But it does help you to identify the areas that worked well vs the areas that need further refinement.

That discussion of themes is where you would identify that task 4 (Add a header) wasn't really a "hot" row. It looks interesting on the heat map, but this wasn't a problem area for LibreOffice. Instead, testers had problems understanding the scenario task. "Did the task want me to just put the text at the start of the document, or at the top of each page?" So results were inconsistent here. (That was expected, as this "dry run" test was a learning experience for my students. I intentionally included some scenario tasks that weren't great, so they would see for themselves how the quality of their scenario tasks can influence their test.)

Different versions are grouped together

LibreOffice released version 5.3.1 right before we started our usability test. Some students had already downloaded 5.3, and some ended up with 5.3.1. I didn't notice any user interface changes for the UI paths exercised by our scenario tasks, but did the new version have an impact?

I've sorted the results based on 5.3.1 off to the right. See the headers to see which columns represent LibreOffice 5.3 and which are 5.3.1. I don't see any substantial difference between them. The "hot" rows from 5.3 are still "hot" in 5.3.1, and the "cool" rows are still "cool."

You might use a similar method to compare different iterations of a user interface. As your program progresses from 1.0 to 1.1 to 1.2, etc, you can compare the same scenario tasks by organizing your data in this way.

You could also group different testers together

The heat map also lets you discuss testers. What happened with tester #7? There's a lot of orange and yellow in that column, even for tasks (rows) that fared well with other testers. In this case, the interview revealed that tester was having a bad day, and came into the test feeling "grumpy" and likely was impatient about any problems encountered in the test.

You can use these columns to your advantage. In this test, all testers were drawn from the same demographic: a university student around 18-22 years old, who had some to "moderate" experience with Word or Google Docs, but not LibreOffice.

But if your usability test intentionally included a variety of experience levels (a group of "beginner" users, "moderate" users, and "experienced" users) you might group these columns appropriately in the heat map. So rather than grouping by version (as above) you could have one set of columns for "beginner" testers, another set of columns for "moderate" testers and a third group for "experienced" testers.

No comments:

Post a Comment