Usability Testing

Usability testing and user testing are slightly different, both are testing techniques in HCI. User testing is for categorizing users or tasks and improving HCI theory. Usability testing is frequently done by usability experts (UX) to discover usability problems in specific UIs. User testing is a more formal process preformed by Human Factors (HF) scientists. The tests are specific tasks performed by the participants. Usability testing tends to be more informal, and the tasks performed by the participants are more general. But usability test tasks are specific about the usability concerns addressed by the test. The analysis for user testing is very quantitative using statistics, while the analysis for usability testing tends to qualitative using simpler statistics, e.g. frequency and averaging. They both use many of the same techniques and share many of the same goals.

Evaluation Paradigms and Techniques

Paradigm is a typical example or pattern. Preece, Rogers and Sharp propose the following evaluation paradigms:

  • Quick and dirty – informal discussions with users at any time perhaps using prototypes.
  • Usability Testing – observing the user and recording their performance, for example video taping the session.
  • Field studies – going to the users’ sites and using surveys or observing users using the UI.
  • Predictive – experts using heuristic evaluations or formal models to evaluate the UI.

Summarizing the differences between the evaluation paradigms:

Evaluation paradigm:Quick and dirtyUsability testingField studiesPredictive
Role of the user
in the evaluation
Natural behaviorTo perform tasksNatural behaviorNone
Who controls
the evaluation
Evaluators has
minimum control
Evaluator has
strong control
Relationship between
Evaluators and customers
Expert
Evaluators
Location
of the evaluation
Natural environment
or lab
LabNatural environmentLab or on the
premises
When the evaluation
is used
Any timewith prototype
or product
Earlywith prototype
Type of data collected
from the evaluation
qualitative; informal
discussions
Quantitative; statisticalQualitative, sketchesList of problems
How the data is Fed back into the designSketches and
quotes
Report on
performances
Descriptions at workshop,
reports and sketches
Report
Philosophy or theory
of the evaluation
User-centeredScientific/experimentalEthnographicTheory

They categorize usability testing as controlled testing of users performing tasks on a prototype in the laboratory. They also categorize usability testing as quantitative and based on scientific research. This is true compared to the other paradigms in the above table, but not true compare to user testing. Usability test results are reported in the academic literature.  The usability test report gives the results of user performance on UI, but also reports on user answers on questionnaires and interviews.

Technique is a specific way of preforming a task. For evaluation, Preece, Rogers and Sharp propose these evaluation techniques:

  • Observing users – using notes, audio, video and logging to record the use of a system.
  • Asking users – using interviews and surveys to get users opinion about the system.
  • Asking experts – for example Heuristic Evaluation or Cognitive Walkthrough.
  • Testing user performance – typically in the lab or the field.
  • Using models – to describe behavior, for example GOMS or HTA and Fitts’ Law

Summarizing the relationships between evaluation paradigms and techniques:

Technique\ParadigmQuick and dirtyUsability testingField studiesPredictive
Observing UsersSeeing how users behave in their environmentVideo and interaction logs. Analyzed for errors, performance, route in UI, etc.Ethnography is central to Field StudiesNA
Asking UsersDiscuss w/ potential users, individually or in focus groupsPre and Post testing surveys. Structured InterviewsInterviews or discussionsNA
Asking ExpertsProvide usability critiques on prototypesNANAHeuristic Evaluation
User testingNATesting typical users on typical tasks.  Central to Usability TestingNA 
ModelingNAIn academia compare w/ theory.In academia compare w/ theoryGOMS etc.

NA = not applicable, not used

Usability testing uses all the techniques except ‘asking experts’.

The basic structure of a usability test session is:

  1. Pretest introductions and explanation of the UI and the tasks of the test.
  2. Conduct and observe participant performing tasks
  3. Post-test questionnaires and/or interviews and/or structured discussions to get user feelings and opinions of the UI.

Planning an Usability Test

Preece, Rogers and Sharp use the acronym DECIDE to explain the steps for planning an evaluation:

  1. Determine the general goals of the test
  2. Explore the specific questions of the test
  3. Choose the evaluation paradigms and techniques
  4. Identify the specific practical issues, such as selecting participants and tasks
  5. Decide how to deal with ethical issues
  6. Evaluate, interpret, and present the data.

The acronym ‘DECIDE’ is good because it stresses that the preparation is about making decisions. You must decide on the goals and generate the specific usability concerns or questions.  This is critical for usability testing. In user testing the scientist have a hypothesis in mind to test, but in usability testing usability experts have specific usability concerns that they are investigating.

The ‘I’ in DECIDE is for identifying the practical issues. These practical issues are:

  • Users: which users do you want to evaluate? What participants can you evaluate? How can you solicit the participants? How will they participate, e.g. what tasks?
  • Facilities and equipment in the laboratory. What is the prototype? How to gather the data?
  • Schedule and budget constraints influence the number participants and the procurement of new equipment, also what and how much analysis the usability expert can perform.
  • Expertise: What expertise does the evaluation team have? Can they perform the test and evaluation?

Ethical Issues:

Preece, Rogers and Sharp offer these procedures to insure proper ethics during the testing:

  • Tell the participant the goals of the test in language that they will understand, but not in so much detail that will bias their performance. Tell them, “We want to learn how well this UI works for you,” and not, “We want to know if you will miss seeing a button.”
  • Explain the tasks clearly and without biasing participant’s performance.
  • Be sure to tell them that all data will be confidential and promise anonymity.
  • Tell the participants that they can stop at any time during the test.
  • Pay participants when you can; this makes the relationship between you and the participant professional. In this class, you will not be able to pay participants.
  • In the report, avoid quotes that identify the user.
  • Ask users if in the future, you can quote them.

The above is a good procedure, but they do not replace the most important aspect: treat the participants with respect. Treat them with respect even while you are designing the test. I have learned that the better you treat the participants (before and during the tests), the better the results will be.

Graduate students need to go through the CITI training.

The Collaborative Institutional Training Initiative (CITI Program) is the company that MTU uses to certifies personal have gone through ethical training for human and animal testing. To conduct the usability testing you need to get the certificate. To get the training, go to:

http://www.mtu.edu/research/administration/integrity-compliance/review-boards/human-subjects/citi-training.html

In the Required Modules section, there are two links:

  • CITI Login and Registration
  • CITI Step-by-Step Instructions

First download the “CITI Step-by-Step Instructions” then use it to register on CITI and select your training. You will  choose the “Social/Behavioral Research Course (12 modules)” course.

Evaluate and interpret results:

Before the actual testing, evaluate potential results, meaning the results from observations, measurements, and answers to survey and interview questions. Consider:

  • Reliability – will the the results be repeatable?
  • Validity – will the observations and participants’ responses to questions measure correctly the usability aspects you want to investigate.
  • Biases – Measurements can have bias, but you should be aware of the bias, and if they make make the conclusions invalid.
  • Scope of test – this is important, you can not test everything, but you should assure that test covers the aspects of the UI addressing your usability concerns.
  • Ecological validity – environmental factors that might bias the results, for example do the tests need to be conducted in the field to be valid or address your usability concerns.

Usability Test Development

You should have a good idea what is a usability test, but only a general idea how to plan the test. I’ll try to address some specific issues of developing a usability test, but I can not go over everything. Preece, Sharp and Rogers in Interaction Design, chapters 12-15, discuss some of the specifics and give some examples. Barnum in Usability Testing and Research, chapters 5 through 7, gives a very detail example of their usability test for Medline and hotmail website.

Goals and Usability Concerns

The testing goals and your usability concerns of the UI are the most important aspect of designing a good test. Preece, Sharp and Rogers do not give much insight into how to generate goals and concerns. In industry, the client (a representative from a software company or design team) frequently gives vary vague goals for the usability test. It is your job to determine more specific goals.

Barum suggests answering these two questions:

  1. What do you want to learn from the test?
  2. How will you measure what you want to learn?

The list above point out that you can only learn from a test, what you can measure or observe. Although you should consider what you can measure, I think it is better to first generate a list of questions about the use of the UI and then determine how you can make the measurements.

Rojek and Kanerva (“A data-collection strategy for usability tests” in IEEE transaction on Professional Communication, 1994, from Barnum’s Usability Testing) gives a list of questions:

  • What do you want to learn from the test?
  • What are your greatest fears for the product?
  • Are there different opinions in the design team about a design issue?
  • What can be changed about the design as a consequence of the test results?
  • Have the design team made assumptions about the users?
  • Are particular features of the design targeted for a specific issue?
  • How will you know if the product is good?

This list of questions is good for industry, but we can add to list when testing more experimental UIs. I suggest this list of questions to ask your selves:

  • Is there something unique about the UI that can be tested?
  • Is there a concern about an interaction aspect of the UI that you can test?
  • Is there a concern about a graphical or information displayed in the UI?
  • Is there an ergonomics aspect of the device that is a concern?
  • Is there an environmental aspect that is a concern?
  • What will users think of the device or UI?
  • What are the vertical and horizontal extents of the prototype?

Answer these questions then making a list of goals for the test will be easy. If a heuristic evaluation was performed on the UI, you can look at the heuristic evaluation to generate some usability concerns, or you can first conduct an heuristic evaluation on the design.

What usability concerns do you have for your app designs?

Observations and Tasks

Your goal is to write a test plan. Part of generating a test plan is to design a task for the participant to perform, so consider what can be measured by observing the task:

  • Time to perform a task
  • Percentage of task completed
  • Number of errors
  • Time to recover from an error
  • Number of repeated or failed commands
  • Number of features or commands not used
  • Time spent navigating and/or searching for information
  • Number of clicks or taps to perform a task
  • Quality/Quantity of information found
  • and more?

The usability test can also include post task questionnaires or interviews. What can you learn and measure from questionnaires:

  • How does the participant feel about the product?
  • Was the participant frustrated?
  • Was the participant satisfied using the product?
  • Was the participant amused using the product?
  • What was the participant thinking while using the product?

Questionnaires can generate quantitative measures especially if they use a Likert scale (see below), but questionnaires are also used for qualitative measures.

Another technique is observing the user while performing the task:

  • Facial expression
  • Vocalization
  • Hand motion
  • body language

Current HCI research is trying to develop quantitative measures from these observations, but you can use them as qualitative measures.

Finally usability testing can use think aloud protocol. “Think aloud” is the technique of asking the participant vocalize their thoughts while preforming tasks. Psychologists have formal techniques for analyzing think aloud, but these are involved and tedious. You could use ‘think aloud’ and use naive interruption of the vocalization as an informal qualitative measure.

Triangulation

Frequently usability testing cannot use a single task or measurement to answer questions about a design concern. Consider the design concern or question: “Is the product easy to use?” The question is a legitimate usability concern, but how do you measure it? The time to perform a task is a good and quantitative measure, but how long is too long? Observation of the user may show facial expressions suggesting that the user is perplexed or frustrated. Participants may answer questions in a survey or interviews that indicate the user was frustrated or thought that the task was hard. If you use all these measures, you have confirming evidence.  Using multiple techniques to probe a usability concern is called triangulation .

You can compare your list of potential concerns with what can be measured and throw out concerns that cannot be measured. With the remaining list of concerns, you can generate test goals.

Test Plan and Scenarios

Your short term goal is to generate a test plan composed of tasks. Each task or set of tasks has at least one test goal and frequently several measurements.  Usability test plans generally are composed of several test scenarios. Test scenarios are short stories that you tell the participants before they perform the tasks. Test scenarios set the scene for the participant and suggest what the participants should do. Test scenarios are essential for usability testing. The usability test administrator should not explicitly tell participants what to do by saying, “Move the cursor to a button and click.” So how does the usability test administrator explain to the participant what to do? The test administrator tells a story like,

“You are a costumer that would like to purchase a new broom to sweep the floor. Please find the brooms in this website, choose a broom, and make the purchase.”

The scenario avoids explicitly telling the user what to do. Also note that this scenario contains several tasks. Using scenarios, usability testing can measure more than how long it takes to press a button. For example, what design concerns or test goals could the above scenario address?  The scenarios can describe the environment and give a fore story to the participant. For example you may want the participant to imagine that they are in a car using the device. Write these descriptions down so that you can repeat it exactly to each participant. Also the test should impose appropriate environmental constraint, for example having the participant sitting or standing.

Now you are ready to write a test plan. A typical test plan outline:

Test Plan Name

Scenario Name

Goals – what you want to learn from the test scenario

Scenario – the actual story (By itself, on a separate sheet of paper)

Task list – short description of the actual tasks the user should perform if done correctly and efficiently. You will not give the task list to the participant.

Quantitative measurement list – what measurements the loggers will record

Qualitative measurement list 

Potential observations of users

Post Scenario interview or questionnaire questions (By itself, on a separate sheet of paper)

Test set up details

For this course you should have at least two scenarios. In industry, the usability experts design enough scenarios to cover all the test goals, which hopefully address most of the design concerns.

Questionnaires and Interviews

Both questionnaires and interviews are lists of questions to ask the participants. It is possible to write bad questions. Bad questions have one of these aspects:

  • long questions
  • confusing questions
  • elaborate sentence construction
  • using jargon, especially technical terms
  • leading questions
  • “Double-barreled questions” – asking two separate (possibly related) questions. These questions should be separated.
  • Using negative words in a questions – e.g. “Do you agree that using the app is hard?” These can be confusing.
  • Biased wording in questions – e.g. “Don’t you agree …”
  • “Hot-button words, such as “liberal”, “conservative”, etc.

The difference between questionnaires and interviews is how the data is recorded. Participants write the answers to questionnaires, so recording is easy. But information can be missed by questionnaires, for example when a follow up question was not anticipated. Interviews can seek answers that do not tend to be short answers and probe for more information.

For both questionnaires and interviews write the questions out and review them with the team, looking for bad aspects of the questions. Then consider the order of the questions. Answering one question can, for good or bad, lead to an answer to the next question. Consider when participants answer a question what will they be thinking. Consider the implications for the answers to the next question.

The types of interviews:

  • Open-ended interview
  • Unstructured interview
  • Structured interview
  • Semi-structured interviews
  • Group interview

Open-end and group interviews take a lot of skill to conduct and are hard to analyze. You should perform structured interviews.

During an interview:

  1. Introduce your self and what the interview is about
  2. Ask a few easy warm up questions
  3. Ask your main questions
  4. Ask a few easy cool down questions
  5. Thank the participant

Be professional, dress similar to the interviewee. If you are using a recorder make sure it works before the interview. If you will be taking notes try to write them down exactly, at least do not change the meaning of the answer, and be consistent about how you abbreviate the answers.

Standard types of survey questions:

  • Yes/No Maybe? questions
  • Multiple choices (nominal or ordinal)
  • Likert scales – Likert scales use agree and disagree as the anchor of the scale.
  • Semantic differential scales – this is a  scale where polar adjectives are the anchor.
  • Check box options, questions that ask to choose all that apply.
  • Ranking
  • Comparison questions
  • Frequency of use type questions
  • Short answers
  • Phase completion
  • more?

In general, questions can be open-ended questions or closed-ended questions. Open-ended questions are good for eliciting new information while closed-ended questions are good for quantitative analysis. Closed-ended questions can have ordered responses such as Likert Scales or unordered responses such selecting from a list. Order responses are values of ordinal variables such as integers (1, 2, 3, …) or alphabet characters (A, B, C, …). The ordinal variables need anchoring to associate meaning to the ordinal variables. For example a Likert Scale question would ask “To what degree do you agree or disagree with …. where 1 means strongly disagree and 5 means strongly agree.” The anchors can be on the extreme values as in the example above, for each value, or my favorite for the extreme and mid values, such as “1 means strongly disagree, 3 means undecided and 5 means strongly agree.”

Unordered responses are values for categorical variables. Example of questions for eliciting unordered responses are multiple choice question. The responses can be restricted to a single response or multiple selections.

Keep the questionnaire short, 20 questions. The questionnaire should not take more than 10 minutes to complete. Be sure you really need the answers to the questions that you ask and that you know how you will use the data. I have stopped answering many questionnaires because they went on and on. Another point, especially for online questionnaires or forms, is to be sure that there is an honest indicator of progress through the questionnaire.

Consider using ‘short answer’ questions. Although they are hard to quantify, they can give a lot of information that you could not predict while planning the test. Equally important, they can help you identify bias or bad design in the questionnaire itself. Do not forget to ask, “Do you have suggestions on how to improve the …?” This shows respect for the participants’ input, and you may will be surprised by their answers.

Existing Usability Questionnaires

There are several existing usability questionnaires. Gary Perlman’s website list many of the surveys.

http://garyperlman.com/quest/

The page is really about Perlman’s script for running a survey but you can view many existing survey in the table at the top of the page. In particular look at

  • Computer System Usability Questionnaire (CSUQ) by Lewis (1995) has 19 questions.
  • Interface Consistency Testing Questionnaire (ICTQ) by Ozok and Salvendy (2001)
  • Purdue Usability Testing Questionnaire (PUTQ)
  • Software Usability Measurement Inventory (SUMI)
  • Website Analysis and MeasureMent Inventory (WAMMI)

Observing and Recording Tests

The basic techniques for observing participants during usability testing:

  • Taking notes
  • Audio recording
  • Still photographs
  • Video recordings
  • Event logging software

Advantages and disadvantages of each technique might not be completely clear. Video and audio data can capture a lot of data but they are hard to analyze and take a lot of expertise. Also care must be taken to be sure that the subject appears in the field of view of the camera (frequently several cameras have to be used). During audio recording, care must be taken to assure that the desired recording is not obscured by noise. Event logging software can efficiently capture a lot of data and make analysis easier, but they are either expensive or take time to program. ‘Note taking’ is cheap and effective means of recording observations. Time resolution is lower then event logging because the notes must be handwritten or typed. The time resolution can be improved by creating a short hand and programming macros used in a word processor. Still you will want to measure tasks that take a longer time to perform, at least 5 seconds.

Logging by Notes

If anyone develops a set of macros for Word or a program for writing notes, please share them with me and the class. (See page 245 in Barum’s Usability Testing and Research.) The document should help you generate a csv (coma separated variable) file. The columns in the log file can be line number, time stamp, event code and event description. So the macro could automatically generate the line number and time stamp after each line return, then all the logger needs to do is type the event code and description.

You will want a list of event codes. How many and what events? This depends on the test and the analysis you expect to perform. The event code serves two purposes; a short hand for logging and assistance in the analysis. So the event codes should be unique single characters that are easy to remember. Without too much practice a logger can remember about 5 event codes. Not very many, so they can not be too specific, for example ‘h’ = ‘selected the help menu’ or ‘e’ = ‘hit enter key’ might be too specific. But if they are too general like ‘c’ = ‘made a command’ or ‘f’ = ‘user made a face’ they might not help too much. The description can make the event code more specific. For example

c help

could mean ‘clicked help menu.’  As discussed below you will probably have more than one logger. So each logger can have their own set of event codes, 3 loggers would approximately total 15 event codes. Using multiple log files requires that there is a synchronization event, such as a start event voiced by the test administrator, and then the time stamps can be synchronized across the log files.

Software Event Logging

There are several options for recording UI events for web apps:

  • Writing to the log file
  • Writing into SQLite database
  • Writing to a file

Writing to Grails’s log file is easy to implement. But the log file is verbose and you will have to parse it. Also the file will be written in the tomcat log files. I do not recommend it because it is not easier than writing to a file.

Writing to a database would require additional Model or Domains in web app. Logging to a database has the advantage that it is fast and memory usage minimal.

Writing to a file is simple. The file should be in comma separated variable (csv) format. The app will have to make a File,

File file = new File("path/log-<time>.csv");

A relative path will be in web-app/ directory which you can access. You’ll lose the file if you redeploy.

In the controller, writing to the file is easy, use either:

file.write “$time, $event”
file << “$time, $event”

Design the format of the file. I think it is best to consider all actions on the UI as events that the evaluator could use

“<event number>, <view>, <event name>, <event target>, <event time>, \”<detials>\” \n”

where

  • <event number> = sequential number of events
  • <view> = activity or view name
  • <event name> = name of the event for example onClick or onCreate etc.
  • <event target> = the name of the button or item
  • <event time> = time

For example:

1, HomeController, index, ,123456789, “home page accessed”

2, SitesController, listAll, , 123457123, “list all sites”

40, Observation, submit, ,123459123, “new observation submitted”

This file format can easily be interpreted by Excel or statistics packages.

Writing events that occur in the controller is easy. Recording only controller events will allow you to record when participants enter a webpage and when they have submitted an observation. This could be a useful performance measure. Controller events will not record the time that participants enter text into a textbox. Writing these interaction events is harder and requires using JavaScript, saving a JSON and passing the JSON in a request to the Controller and then writing the JSON to the log file.

Audio Logging

Why use an audio recording unless you are interviewing? You could use participants’ groans for determining their mood, or you might be using ‘think aloud’ protocol. You would want to audio record the think aloud vocalization of the participants when they are performing the tasks. This is an effective technique and only slows down the time to perform tasks a little.  It is hard to formally analyze, but in the context of usability testing you can get some idea of what the participant was thinking with an informal analysis. You may want to audio record the post test interview to get an idea about what the user was thinking. Post test interviews are not as reliable as think aloud because the participants may not remember and may answer what they think you want to hear. So only ask about memorable events.

Usability Test Team

When you conduct the test, your team should have assigned roles. For example:

  • Administrator/facilitator/briefer
  • Recorder/logger
  • Observer

Consider having several loggers splitting up the responsibilities of recording events. For example a logger could track what the user does on the GUI and another logger could track facial expressions or body language. You can have more than one logger recording a single event to assure reliability or refine what they are observing.

Loggers do not have enough time to make general observation while recording specific gestures or articulations, and the facilitator/briefer is too busy being attentive to the user to take notes or make observations. You may want a general purpose observer, who makes notes on the general progress of the test and if anything unusual occurred.

Conducting Usability tests

The general procedure for the usability tests session:

  1. Prepare test room: make sure the programs and equipment work, and you have the forms and questionnaires.
  2. Greet the Guest: introduce your self and the other members of the team. Briefly describe what will happen and give the consent form. Describe what is on the consent form so the participant does not have to read the form if they do not want to.
  3. Screening questions /or question to select/prepare scenarios – you probably will not use
  4. Explain interface: or any other equipment.
  5. Tell the scenario: And any other specific instructions, such as the environment. This should be on a separate piece of paper for you to read.
  6. Observe the participant performing the task
  7. Post scenario questionnaire and interview: use both
  8. Repeat: steps 4-7 for each scenario
  9. Post test questionnaire and interview: General feeling about the app and demographics. This should be on a separate piece of paper.
  10. Thank the participant
  11. Organize the files

The test should not be too short, but should not be longer than an hour. Participants can not concentrate for more than an hour. For your usability tests a half hour test would be good.

Immediately after the test organize your notes. Transcript your notes and organize them in the computer as soon as possible; if you can, immediately after the test session.

Practice and Pilot Studies

Write a script and practice; practice your test with your development team. I always practice the complete usability test before administrating the test. This gives me an idea of how long the test will take and generates preliminary results. You get only one shot with a participant, so do not lose that opportunity. I have felt sad whenever I have not been able to use data from a test because something trivial was wrong. It was a waste of time for the participant and me. You probably cannot make pilot studies but you can perform the test on your self. You can determine how long the test will take and sometimes verify the correct parameters.

In industry, the first couple of participants can be used as the pilot study. But this requires that there is time to change the test before the next scheduled participant. If the test has to be redesigned then the results from the pilot study are not used in the final analysis. If the test is not changed then the results from the pilot study can be used in the final analysis.

We will have usability testing practice sessions. Graduates students will work with the whole team going through the test. The application should be on the device expected to be used for the usability test. This is a dress rehearsal.

Analyzing Test Results

General Procedure

Analyzing the test results can consists of:

  1. Collocate log files for each participant/scenario
  2. Summarize each scenario across all participants, includes plots and trend analysis
    1. Quantitative measures
    2. Qualitative observations
    3. Questionnaire answers
  3. Make conclusions for each test goal

If there are many log files for each user/scenario then you will want to collocate the files, meaning you correlate the events in the different files into a single file. If the log files are in csv format with time stamps then this is easy and not too tedious:

  1. load each file into a spread sheet
  2. Convert time stamp to relative time stamp using the synchronization event.
  3. Merge the files into a single spread sheet
  4. Sort by relative time stamp and add global line numbers
  5. Add additional columns and/or rows for summarizing results and make calculations
  6. Add questionnaire results, this is easy if using Likert scale.

Do this for each participant. You will have all the data from one participant and scenario together in one file; later when you do trend analysis or look for collaborating results you will be able to find them easily.

Summarizing the test results is now easy. For each scenario, you look in the summary form, log files and questionnaire data for each participant and make a table or plot. You should begin to make conclusions, so you compare answers from the questionnaires to the quantitative results. You can look for other observations in the participant’s collocated file.

Using the summary results you should be able to make conclusions and address the test goals.

Outliers

In statistical test analysts, it is possible to throw out outliers or if the sample is large enough the outliers will be washed out. In usability tests with small number of participants, you cannot throw out outliers, they are significant. You should investigate careful why this participant performed differently. You may be able to uncover a usability concerns for a specific user type. At a minimum, you should be able to write a story about the outlying participant.

Positive and Negative Findings

Record positive findings because:

  1. Everyone likes to hear good news.
  2. If you don’t document the positive aspects of the UI, they could be changed in the future.

Typically, the goals of a usability test is uncovering usability problems, so also report negative findings (called ‘findings’ for short in industry). Barnum suggests analyzing negative finding by scope and severity. Scope is either global to the whole interface or local to a specific task or form. Severity is expressed by levels:

  • Level 1: prevents task completion
  • Level 2: creates significant delay and frustration
  • Level 3: has minor effect on usability
  • Level 4: subtle problem: points to a future enhancement

There are other scales, see page 270 in Barnum’s Usability Testing

Qualitative Analysis

(From Preece, Rogers and Sharp Interaction Design)

Qualitative analysis can be used to tell a story. Also they can support quantitative results with examples.

Qualitative analysis involves making collections out of the data so that they can be compared or reveal trends. Preece, Rogers and Sharp suggest using a team to analyze the qualitative analysis to provide many perspectives.

Qualitative analysis can determine categorizations, for example the general types of usability problems. Also it can reveal patterns, such as if this happens then a problem will occur. There are many formal methods for analyzing qualitative data:

  • Activity Theory
  • Content Analysis
  • Discourse Analysis
  • Conversation Analysis
  • Think Aloud Protocol Analysis

I do not know all them, and they would be a course to themselves.

Barnum does discuss two approaches to qualitative analysis; Top-down categorization and Affinity Analysis. Top-down categorization is predefining categories (Usability Yardstick, page 169, Barnum) and then sorting the usability findings into the categories. Typically the categories are defined before the test, and a goal of the test is to measure the interface using these categories. So a count of the frequency can reveal a general usability problem.  Affinity analysis is typically done with several analysts (page 250, Barnum), iterative arranging the findings into groups until there is consensuses among the analysts. Only after the findings are grouped are the groups labeled.

Triangulation Analysis

Barnum suggests triangulation analysis which is a comparison of:

  • Performance measurements
  • Subjective measurements form questionnaires and interviews
  • Issue lists from the Top-down categorization of usability findings

And use this analysis to justify the usability problems and severities.

Report

Barnum gives detail descriptions and examples of Usability Reports. The usability test reports tend to be very long. They have a lot to report, and several formats to give the results because many different people (executive, designers, and experts) will read the report. The outline of the report:

  1. Cover letter
  2. Executive summary
  3. Introduction
  4. Methodology
  5. Results
  6. Recommendations/actions
  7. Appendices

Barnum also suggests previewing the report with a short document, called roadrunner (Harrison and Melton). The report should:

  • Catchy graphics
  • Brief, one page
  • Includes charts
  • speak the reader’s language
  • Include users comments
  • Positive feedback
  • Tie the results to the original usability goals
  • Emphasize the need to read the final report for full results
  • Include a short summary/implications

For this course

I do not need nor will I read a lengthy report, but I do not want a hyped report like the roadrunner. My outline:

Cover page: name the group and the usability expert (graduate student)

Introduction: Description of UI, test goals and brief description of tests (1 page)

Test Plans: the original you created for testing (approximately 1 page per scenario)

Results: These should be plots and charts with explanations (what it takes, ~ 2 pages)

Conclusions: Usability problems and suggestions for improving the UI (1 page)