Usability Testing | CS4760, HU4628 & CS5760: Human-Computer Interactions & Usability

Usability testing and user testing are slightly different, both are testing techniques in HCI. User testing is for categorizing users or tasks and improving HCI theory. Usability testing is frequently done by usability experts to discover usability problems in specific UIs. User testing is a more formal process. The tests are specific in the tasks performed by the participants. Usability testing tends be more informal, and the tasks performed by the participants are more general. But usability test tasks are specific about the usability concerns addressed in the UI. The analysis for user testing is very quantitative using advance statistics, while the analysis for usability testing tends to qualitative using simple statistics, i.e. frequency and averaging. They both use many of the same techniques and share many of the same goals.

My expertise is in user testing. I hope that you will have an opportunity to participate in user tests and see the results. The participation can be your introduction to user testing. In this course you will perform usability testing.

Evaluation Paradigms and Techniques

Paradigm is a typical example or pattern. Preece, Rogers and Sharp propose the following evaluation paradigms:

Quick and dirty – informal discussions with users at any time perhaps using prototypes
Usability Testing – observing the user and recording, for example video taping the session
Field studies – going to the users’ sites and using surveys or observing users using the UI.
Predictive – experts using heuristic evaluations or formal models to evaluate the UI, generally at the developers site

Summarizing the differences between the evaluation paradigms:

Evaluation paradigm:	Quick and dirty	Usability testing	Field studies	Predictive
Role of the user in the evaluation	Natural behavior	To perform tasks	Natural behavior	None
Who controls the evaluation	Evaluators has minimum control	Evaluator has strong control	Relationship between Evaluators and customers	Expert Evaluators
Location of the evaluation	Natural environment or lab	Lab	Natural environment	Lab or on the premises
When the evaluation is used	Any time	with prototype or product	Early	with prototype
Type of data collected from the evaluation	qualitative; informal discussions	Quantitative; statistical	Qualitative, sketches	List of problems
How the data is Fed back into the design	Sketches and quotes	Report on performances	Descriptions at workshop, reports and sketches	Report
Philosophy or theory of the evaluation	User-centered	Scientific/experimental	Ethnographic	Theory

They categorize usability testing as controlled testing of users performing tasks on a prototype in the laboratory. They also categorize usability testing as quantitative and based on scientific research. This is true compared to the other paradigms in the above table, but not true compare to user testing. Usability test results are reported in the academic literature. The usability test report gives the results of user performance on UI, but also reports on user answers on questionnaires and interviews.

Technique is a specific way of preforming a task. For evaluation, Preece, Rogers and Sharp propose these evaluation techniques:

Observing users – using notes, audio, video and logging to record the use of a system
Asking users – using interviews and surveys to get users opinion about the system
Asking experts – for Heuristic Evaluation or Cognitive Walkthrough
Testing user performance – in the lab or the field
Using models – to describe behavior, for example GOMS or HTA and Fitts’ Law

Summarizing the relationships between evaluation paradigms and techniques:

Technique\Paradigm	Quick and dirty	Usability testing	Field studies	Predictive
Observing Users	Seeing how users behave in their environment	Video and interaction logs. Analyzed for errors, performance, route in UI, etc.	Ethnography is central to Field Studies	NA
Asking Users	Discuss w/ potential users, individually or in focus groups	Pre and Post testing surveys. Structured Interviews	Interviews or discussions	NA
Asking Experts	Provide usability critiques on prototypes	NA	NA	Heuristic Evaluation
User testing	NA	Testing typical users on typical tasks. Central to Usability Testing	NA
Modeling	NA	In academia compare w/ theory.	In academia compare w/ theory	GOMS etc.

NA = not applicable, not used

Usability testing uses all the techniques except ‘asking experts’.

The basic structure of a usability test session is:

Pretest introductions and explain the UI and the tasks.
Conduct and observe participant performing tasks
Post-test questionnaires and/or interviews and/or structured discussions to get user feelings and opinions of the UI.

Planning an Usability Test

Preece, Rogers and Sharp use the acronym DECIDE to explain the steps for planning an evaluation:

Determine the general goals of the test
Explore the specific questions of the test
Choose the evaluation paradigms and techniques
Identify the specific practical issues, such as selecting participants and tasks
Decide how to deal with ethical issues
Evaluate, interpret, and present the data.

The acronym, ‘DECIDE’, is good because it stresses that the preparation is about making decisions. You must decide on the goals and generate the specific usability concerns or questions. This is critical for usability testing. In user testing the scientist have a hypothesis in mind to test, but in usability testing usability experts have specific usability concerns that they are investigating.

The ‘I’ in DECIDE is for identifying the practical issues.

Users: which users do you want to evaluate? What participants can you evaluate? How can you solicit the participants? How will they participate, e.g. what tasks?
Facilities and equipment in the laboratory. What is the prototype? How to gather the data?
Schedule and budget constraints influence the number participants and the procurement of new equipment, also what and how much analysis the usability expert can perform.
Expertise: What expertise does the evaluation team have? Can they perform the test and evaluation?

Ethical Issues:

Preece, Rogers and Sharp offer these procedures to insure proper ethics.

Tell the participant the goals of the test in language that they will understand, but not in so much detail that it will bias the results. Tell them, “We want to learn how well this UI works for you,” and not, “We want to know if you will miss seeing a button.”
Explain the tasks clearly and without biasing the results.
Be sure to tell them that all data will be confidential and promise anonymity.
Tell the users that they can stop at any time during the test.
Pay users when you can; this makes the relationship between you and the participant professional. You will not be able to pay participants.
In the report avoid quotes etc. that identify the user.
Ask users if in the future, you can quote them.

The above is a good procedure, but they do not replace the most important aspect: treat the participants with respect. (This is why we call them participants and not subjects.) Treat them with respect even while you are designing the test. I have learned that the better you treat the participants (before and during the test) the better the results will be.

Graduate students need to go through the CITI training.

http://www.mtu.edu/research/administration/integrity-compliance/review-boards/human-subjects/citi-training.html

In the Required Modules section, there is are two links:

CITI Login and Registration
CITI Step-by-Step Instructions

First download the “CITI Step-by-Step Instructions” then use it to register on CITI and select your training. You will choose the “Basic Human Subjects – Social & Behavioral Focus” course and then choose to take “Students conducting no more than Minimal Risk Research.”

Evaluate and interpret results:

Before the actual testing, evaluate potential results. Consider:

Reliability – are the results repeatable
Validity – does it measure correctly the usability aspects you want to investigate
Biases – there can be some bias, but you should be aware of them and if they can make the conclusions invalid.
Scope of test – this is important, you can not test everything.
Ecological validity – environmental factors that might bias the results.

Usability Test Development

You should have a good idea what is a usability test, but only a general idea how to plan the test. I’ll try to address some specific issues of developing a usability test, but I can not go over everything. Preece, Sharp and Rogers in Interaction Design, chapters 12-15, discuss some of the specifics and give some examples. Barnum in Usability Testing and Research, chapters 5 through 7, gives a very detail example of his usability test for Medline and hotmail website.

Goals and Usability Concerns

Testing goals and usability concerns of the UI are the most important aspect of designing a good test. Preece, Sharp and Rogers do not give much insight into how to generate goals and concerns. In industry, the client (a representative from a software company or design team) frequently gives vary vague goals for the usability testing. It is your job to determine the goals for the test.

Barum suggests answering these two questions:

What do you want to learn from the test?
How will you measure what you want to learn?

The list above begs the question, but does point out that you can learn from a test on what can be measured or observed. Although you should consider what you can measure, I think it is better to first generate a list of questions about the use of the UI and then determine how you can make the measurements.

Rojek and Kanerva (“A data-collection strategy for usability tests” in IEEE transaction on Professional Communication, 1994, from Barnum’s Usability Testing) gives a list of questions:

What do you want to learn from the test?
What are your greatest fears for the product?
Are there different opinions in the design team about an issue in the design?
What can be changed about the design as a consequence of the test results?
Have the design team made assumptions about the users?
Are particular features of the design targeted for a specific issue?
How will you know if the product is good?

This list of questions is good for industry, but we can to list when testing more experimental UIs. I suggest this list of questions to ask your selves:

Is there something unique about the UI that can be tested?
Is there a concern about an interaction aspect of the UI that you can test?
Is there a concern about a graphical or information displayed in the UI?
Is there an ergonomics aspect of the device that is a concern?
Is there an environmental aspect that is a concern?
What will users think of the device or UI?
What are the vertical and horizontal extents of the prototype?

Answer these questions and make a written list of goals for the test. If a heuristic evaluation was performed on the UI, you can look at the heuristic evaluation to generate some concerns, or you can first conduct an heuristic evaluation on the design.

What usability concerns do you have for your app designs?

Observations and Tasks

Your goal is to write a test plan. Part of generating a test plan is to design a task for the participant to perform, so consider what can be measured by observing the task:

Time to perform a task
Percentage of task completed
Number of errors
Time to recover from an error
Number of repeated or failed commands
Number of features or commands not used
Time spent navigating and/or searching for information
Number of clicks or taps to perform a task
Quality/Quantity of information found
and more?

The usability test can also include post task questionnaires or interviews. What can you learn and measure from questionnaires:

How does the participant feel about the product?
Was the participant frustrated?
Was the participant satisfied using the product?
Was the participant amused using the product?
What was the participant thinking while using the product?

Questionnaires can generate quantitative measures especially if they use a Likert scale (See below.), but they are also used for qualitative measures.

Another technique is observing the user while performing the task:

Facial expression
Vocalization
Hand motion
body language

Current HCI research is trying to develop quantitative measures from these observations, but you can use them as qualitative measures.

Finally usability testing can use think aloud protocol. Psychologists have formal techniques for analyzing think aloud, but these are involved and tedious. You could use ‘think aloud’ as an informal qualitative measure.

Triangulation

Frequently usability testing cannot use a single task or measurement to answer questions about a design concern. Consider the design concern or question: “Is the product easy to use?” The question is a legitimate usability concern, but how do you measure it? The time to perform a task is a good and quantitative measure, but how long is too long? Observation of the user may show facial expression suggesting that the user is perplexed or frustrated. Participants may answer questions in a survey or interviews that indicate the user was frustrated or thought that the task was hard. If you use all these measures, you have confirming evidence. Using multiple techniques to probe a usability concern is called triangulation .

You can compare your list of potential concerns with what can be measured and throw out concerns that cannot be measured. With the remaining list of concerns, you can generate test goals.

Test Plan and Scenarios

Your short term goal is to generate a test plan composed of tasks. Each task or set of tasks has at least one test goal and frequently several measurements. Usability test plans generally are composed of several test scenarios. Test scenarios are short stories that you tell the participants before they perform the tasks. Test scenarios set the scene for the participant and suggest what the participants should do. In usability testing, in contrast to user testing, test scenarios are essential. The usability test administrator should not explicitly tell participants what to do by saying, “Move the cursor to a button and click.” (A user test administrator could tell the participants exactly what to do.) So how does the usability test administrator explain to the participant what to do? The test administrator tells a story like,

“You are a costumer that would like to purchase a new broom to sweep the floor. Please find the brooms in this website, choose a broom, and make the purchase.”

The scenario avoids explicitly telling the user what to do. Also note that this scenario contains several tasks. Using scenarios, usability testing can measure more than how long it takes to press a button. For example, what design concerns or test goals could the above scenario address? The scenarios can describe the environment and give a fore story to the participant. For example you may want the participant to imagine that they are in a car using the device. Write these descriptions down so that you can repeat it exactly to each participant. Also the test should impose appropriate environmental constraint, for example having the participant sitting or standing.

Now you are ready to write a test plan. A typical test plan outline:

Test Plan Name

Scenario Name

Goals – what you want to learn from the test scenario

Scenario – the actual story (By itself, on a separate sheet of paper)

Task list – short description of the actual tasks the user should perform if done correctly and efficiently. You will not give the task list to the participant.

Quantitative measurement list – what measurements the loggers will record

Qualitative measurement list

Potential observations of users

Post Scenario interview or questionnaire questions (By itself, on a separate sheet of paper)

Test set up details

For this course you should have at least two scenarios. In industry, the usability experts design enough scenarios to cover all the test goals, which hopefully address most of the design concerns.

Questionnaires and Interviews

Both questionnaires and interviews are lists of questions to ask the participants. It is possible to write bad questions. Bad questions have one of these aspects:

long questions
confusing questions
elaborate sentence construction
using jargon, especially technical terms
leading questions
“Double-barreled questions” – asking two separate (possibly related) questions. These questions should be separated.
Using negative words in a questions – e.g. “Do you agree that using the app is hard?” These can be confusing.
Biased wording in questions – e.g. “Don’t you agree …”
“Hot-button words, such as “liberal”, “conservative”, etc.

The difference between questionnaires and interviews is how the data is recorded. Participants write the answers to questionnaires, so recording is easy. But information can be missed by questionnaires, for example because a question was not anticipated. Interviews can seek answers that do not tend to be short answers and probe for more information.

For both questionnaires and interviews write the questions out and review them with the team, looking for bad aspects of the questions. Then consider the order of the questions. Answering one question can, for good or bad, lead to an answer to the next question. Consider when the participants answer this question what will they be thinking. Consider the implications for the answers to the next question.

The types of interviews:

Open-ended interview
Unstructured interview
Structured interview
Semi-structured interviews
Group interview

Open-end and group interviews take a lot of skill to conduct and are hard to analyze. You should perform structured interviews.

During an interview:

Introduce your self and what the interview is about
Ask a few easy warm up questions
Ask your main questions
Ask a few easy cool down questions
Thank the participant

Be professional, dress similar to the interviewee. If you are using a recorder make sure it works before the interview. If you will be taking notes try to write them down exactly, at least do not change the meaning of the answer, and be consistent about how you abbreviate the answers.

Standard types of survey questions:

Yes/No Maybe? questions
Multiple choices (nominal or ordinal)
Likert scales – Likert scales use agree and disagree as the anchor of the scale.
Semantic differential scales – this is a scale where polar adjectives are the anchor.
Check box options, questions that ask to choose all that apply.
Ranking
Comparison questions
Frequency of use type questions
Short answers
Phase completion
more?

In general questions can be open-ended questions or closed-ended questions. Open-ended questions are good for eliciting new information while closed-ended questions are good for quantitative analysis. Closed-ended questions can have ordered responses such as Likert Scales or unordered responses such selecting from a list. Order responses are values of ordinal variables while unordered responses are values for categorical variables. Closed-ended questions with unordered responses can restrict responses to a single response or multiple selections.

Keep the questionnaire short, 20 questions. The questionnaire should not take more than 10 minutes to complete. Be sure you really need the answers to the questions that you ask and that you know how you will use the data. I have stopped answering many questionnaires because they went on and on. Another point, especially for online questionnaires or forms, is to be sure that there is an honest indicator of progress through the questionnaire.

Consider using ‘short answer’ questions. Although they are hard to quantify, they can give a lot of information that you would not predict. Equally important, they can help you identify bias or bad design in the questionnaire itself. Do not forget to ask, “Do you have suggestions on how to improve the …?” This shows respect for their input, and you will be surprised by their answers.

Existing Usability Questionnaires

There are several existing usability questionnaires. Gary Perlman’s website list many of the surveys.

http://garyperlman.com/quest/

The page is really about Perlman’s script for running a survey but you can view many existing survey in the table at the top of the page. In particular look at

Computer System Usability Questionnaire (CSUQ) by Lewis (1995) has 19 questions.
Interface Consistency Testing Questionnaire (ICTQ) by Ozok and Salvendy (2001)
Purdue Usability Testing Questionnaire (PUTQ)
Software Usability Measurement Inventory (SUMI)
Website Analysis and MeasureMent Inventory (WAMMI)

Observing and Recording Tests

The basic techniques for observing participants during usability testing:

Taking notes
Audio recording
Still photographs
Video recordings
Event logging software

Advantages and disadvantages of each technique might not be completely clear. Video and audio data can capture a lot of data but they are hard to analyze and take a lot of expertise. Also care must be taken to be sure that the subject appears in the field of view of the camera (frequently several cameras have to be used). During audio recording, care must be taken to assure that the desired recording is not obscured by noise. Event logging software can efficiently capture a lot of data and make analysis easier, but they are either expensive or take time to program. My user tests are logging software that I have written; using Java, they have 16 msec time resolutions. ‘Note taking’ is cheap and effective means of recording observations. Time resolution is lower then event logging because the notes must be handwritten or typed. The time resolution can be improved by creating a short hand and programming macros used in a word processor. Still you will want to measure tasks that take a longer time to perform, at least 5 seconds.

Logging by Notes

If anyone develops a set of macros for Word or a program for writing notes, please share them with me and the class. (See page 245 in Barum’s Usability Testing and Research.) The document should help you generate a csv (coma separated variable) file. Actually, some other delimiter might be more useful. The columns in the log file only should be line number, time stamp, event code and event description. So the macro could automatically generate the line number and time stamp after each line return, then all the logger needs to do is type the event code and description.

You will want a list of event codes. How many and what events? This depends on the test and the analysis you expect to perform. The event code serves two purposes; a short hand for logging and assistance in the analysis. So the event codes should be unique single characters that are easy to remember. Without too much practice a logger can remember about 5 event codes. Not very many, so they can not be too specific, for example ‘h’ = ‘selected the help menu’ or ‘e’ = ‘hit enter key’ might be too specific. But if they are two general like ‘c’ = ‘made a command’ or ‘f’ = ‘user made a face’ they might not help too much. The description can make the event code more specific. For example

c help

could mean ‘clicked help menu.’ As discussed below you will probably have more than one logger. So each logger can have their own set of event codes, 3 loggers would approximately total 15 event codes. Using multiple log files requires that there is a synchronization event, such as a start event voiced by the test administrator, and then the time stamps can be synchronized across the log files.

Software Event Logging

There are several options for recording UI events for web apps:

Writing to the log file
Writing into SQLite database
Writing to a file

Writing to Grails’s log file is easy to implement. But the log file is verbose and you will have to parse it. Also the file will be written in the tomcat log files. I do not recommend it because it is not much easier than writing to a file.

Writing to a database would require additional Model or Domains in web app. Logging to a database has the advantage is that the write would be fast and memory usage minimal.

Writing to a file is simple. The file should be in comma separated variable (csv) format. The app will have to make a File,

File file = new File("path/log-<time>.csv");

A relative path will be in web-app/ directory which you can access. You’ll lose the file if you redeploy.

In the controller, writing to the file is easy, use either:

file.write “$time, $event”
file << “$time, $event”

Design the format of the file. I think it is best to consider all actions on the UI as events that the evaluator could use

“<event number>, <view>, <event name>, <event target>, <event time>, \”<detials>\” \n”

where

<event number> = sequential number of events

<view> = activity or view name

<event name> = name of the event for example onClick or onCreate etc.

<event target> = the name of the button or item

<event time> = time

For example

1, HomeController, index, ,123456789, “home page accessed”

2, SitesController, listAll, , 123457123, “list all sites”

…

40, Observation, submit, ,123459123, “new observation submitted”

This file format can easily be interpreted by Excel or statistics packages.

Writing events that occur in the controller is easy. Recording only controller events will allow you to record when participants enter a webpage and when they have submitted an observation. This could be a useful performance measure. Controller events will not record the time that participants enter text into a textbox. Writing these interaction events is harder and requires using JavaScript, saving a JSON and passing JSON in a request to the Controller (this could be a controller action) and then writing the JSON to the log file.

Audio Logging

Why use an audio recording unless you are interviewing? You could use participants’ groans for determining their mood, or you might be using ‘think aloud’ protocol. You would want to audio record the think aloud vocalization of the participants when they are performming the tasks. This is an effective technique and only slows down the time to perform tasks a little. It is hard to formally analyze, but in the context of usability testing you can get some idea of the participant was thinking with an informal analysis. You may want to audio record the post test interview to get an idea about what the user was thinking. Post test interviews are not as reliable as think aloud because the participants may not remember and may answer what they think they should. So only ask about memorable events.

Usability Test Team

When you conduct the test, your team should have assigned roles. For example:

Administrator/facilitator/briefer
Recorder/logger
Observer

Consider having several loggers splitting up the responsibilities of recording events. For example a logger could track what the user does on the GUI and another logger could track facial expressions or body language. You can have more than one logger recording a single event to assure reliability or refine what they are observing.

Loggers do not have time to make general observation, and the facilitator/briefer is too busy being attentive to the user to take notes or make observations. You may want a general purpose observer, who makes notes on the general progress of the test and if anything unusual occurred.

Conducting Usability tests

The general procedure for the usability tests session:

Prepare test room: make sure the programs and equipment work, and you have the forms and questionnaires.
Greet the Guest: introduce your self and the other members of the team. Briefly describe what will happen and give the consent form. Describe what is on the consent form so the participant does not have to read the form if they do not want to.
Screening questions /or question to select/prepare scenarios – you probably will not use
Explain interface: or any other equipment.
Tell the scenario: And any other specific instructions, such as the environment. This should be on a separate piece of paper for you to read.
Post scenario questionnaire and interview: use both
Repeat: steps 4-6 for each scenario
Post test questionnaire and interview: General feeling about the app and demographics. This should be on a separate piece of paper.
Thank the participant
Organize the files

The test should note be too short, but not longer than an hour, probably a half hour. Participants can not concentrate for more than an hour.

Immediately after the test organize your notes. Enter your notes in the computer as soon as possible; if you can, immediately after the test session.

Practice and Pilot Studies

Write a script and practice; practice your test among yourselves. I always practice the complete user test before administrating the test. This gives me an idea of how long the test will take and generates preliminary results. You get only one shot with a participant, so do not lose that opportunity. I have felt sad whenever I have not been able to use data from a test because something trivial was wrong. It was a waste of time for the participant and me. You probably cannot make pilot studies but you can perform the test on your self. You can determine how long the test will take and sometimes verify the correct parameters.

In industry, the first couple of participants can be used as the pilot study. But this requires that there is time to change the test before the next scheduled participant. If the test has to be redesigned then the results from the pilot study are not used in the final analysis. If the test is not changed then the results from the pilot study can be used in the final analysis.

We will have a mock test day. Graduates students will work with the whole team going through the test. The application should be on the phone. This is a dress rehearsal.

Analyzing Test Results

General Procedure

Analyzing the test results can consists of:

Collocate log files for each participant/scenario
Summarize each scenario across all participants, includes plots and trend analysis
1. Quantitative measures
2. Qualitative observations
3. Questionnaire answers
Make conclusions for each test goal

If there are many log files for each user/scenario then you will want to collocate the files, meaning you correlate the events in the different files into a single file. If the log files are in csv format with time stamps then this is easy and not too tedious:

load each file into a spread sheet
Convert time stamp to relative time stamp using the synchronization event.
Merge the files into a single spread sheet
Sort by relative time stamp and add global line numbers
Add additional columns and/or rows for summarizing results and make calculations
Add questionnaire results, this is easy if using Likert scale.

Do this for each participant. You will have all the data from one participant and scenario together in one file; later when you do trend analysis or look for collaborating results you will be able to find them easily.

Summarizing the test results is now easy. For each scenario, you look in the summary form, log files and questionnaire data for each participant and make a table or plot. You should begin to make conclusions, so you compare answers from the questionnaires to the quantitative results. You can look for other observations in the participant’s collocated file.

Using the summary results you should be able to make conclusions and address the test goals.

Outliers

In user test analysts, it is possible to throw out outliers or if the sample is large enough the outliers will be washed out. In usability tests with small number of participants, you cannot throw out outliers, they are significant. You should investigate careful why this participant performed different. You may be able to uncover a usability concerns for a specific user type. At a minimum you should be able to write a story about the outlying participant.

Positive and Negative Findings

Record positive findings because:

Everyone likes to hear good news.
If you don’t document the positive aspects of the UI, they could be changed in the future.

Typically, the goals of a usability test include uncovering usability problems, so also report the negative findings (called ‘findings’ for short in industry). Barnum suggests analyzing negative finding by scope and severity. Scope is either global to the whole interface or local to a specific task or form. Severity is expressed by levels:

Level 1: prevents task completion
Level 2: creates significant delay and frustration
Level 3: has minor effect on usability
Level 4: subtle problem: points to a future enhancement

There are other scales, see page 270 in Barnum’s Usability Testing

Qualitative Analysis

(From Preece, Rogers and Sharp Interaction Design)

Qualitative analysis can be used to tell a story. Also they can support quantitative results with examples.

Qualitative analysis involves making collections out of the data so that they can be compared or reveal trends. Preece, Rogers and Sharp suggest using a team to analyze the qualitative analysis to provide many perspectives.

Qualitative analysis can determine categorizations, for example the general types of usability problems. Also it can reveal patterns, such as if this happens then a problem will occur. There are many formal methods for analyzing qualitative data:

Activity Theory
Content Analysis
Discourse Analysis
Conversation Analysis
Think Aloud Protocol Analysis

I do not know all them, and they would be a course to themselves.

Barnum does discuss two approaches to qualitative analysis; Top-down categorization and Affinity Analysis. Top-down categorization is predefining categories (Usability Yardstick page 169 Barnum) and then sorting the usability findings into the categories. Typically the categories are defined before the test and a goal of the test is to measure the interface using these categories. So a count of the frequency can reveal a general usability problem. Affinity analysis is typically done with several analysts (page 250 Barnum), iteratively arranging the findings into groups until there is consensuses among the analysts. Only after the findings are grouped are the groups labeled.

Triangulation Analysis

Barnum suggests triangulation analysis which is a comparison of:

Performance measurements
Subjective measurements form questionnaires and interviews
Issue lists from the Top-down categorization of usability findings

And use this analysis to justify the usability problems and severities.

Report

Barnum gives detail descriptions and examples of Usability Reports. The usability test reports tend to be very long. They have a lot to report, and several formats to give the results because many different people (executive, designers, and experts) will read the report or some part of the report. The outline of the report:

Cover letter
Executive summary
Introduction
Methodology
Results
Recommendations/actions
Appendices

Barnum also suggests previewing the report with a short document, called roadrunner (Harrison and Melton). The report should:

Catchy graphics
Brief, one page
Includes charts
speak the reader’s language
Include users comments
Positive feedback
Tie the results to the original usability goals
Emphasize the need to read the final report for full results
Include a short summary/implications

For this course

I do not need nor will I read a lengthy report, but I do not want a hyped report like the roadrunner. My outline:

Cover page: name the group and the usability expert (graduate student)

Introduction: Description of UI, test goals and brief description of tests (1 page)

Test Plans: the original you created for testing (approximately 1 page per scenario)

Results: These should be plots and charts with explanations (what it takes, ~ 2 pages)

Conclusions: Usability problems and suggestions for improving the UI (1 page)