Expert Evaluation

Three goals for UI evaluations are:

  • Assess the extent of the system’s functionalityDoes the UI provide sufficient functionality and appropriate functions?  Is there too much functionality?
  • Assess the effects on the user: How long does it take to use the UI?  Is the user satisfied?  Does the user respond correctly? Can the user access the information presented by the UI?
  • Identify specific usability problems: Does the UI make mistakes?  Is the user confused?

Evaluations can be divided into two categories, formative and summative evaluations (Scriven, 1967), depending on when the evaluations occur.  I like to call the two categories after the fact for summative evaluations and before implementation for informative evaluations.  In this class we are primarily concerned with evaluations before the implementation because these are the techniques you can use for your projects.

After the Fact: Summative Evaluations

Evaluations after implementation are like “learning by taking a test.”  The results are disappointing, and programmers are reluctant to change or learn.  Nevertheless, evaluating an UI after implementation can offer more accurate interpretation and discover weaknesses in the design that were not apparent to the designers before implementation.  If designer and programmers approach the evaluation as a learning process, they can use the results to improve the next product.

Designers can observe users in the field, meaning at the actual work site, using the UI. Field studies have the advantage that they depict the actual environment and circumstance of the application’s use. Designers can learn if critical tasks are performed during interruptions such as phone calls. Also the designer can observe what users do to solve the problems. Observing users in the field has the difficulty that users are aware that they are being observed, so they may not perform naturally.

Some aspects of the system can only be evaluated in the field, such as the effects due to the environments and users. For example a UI  used in a noisy environment may produce errors, or the user may be distracted while using the UI. The UI experiences interference by other tasks that the user is performing. Even within the same UI, a complex task may consist of sub tasks that contradict each other.  Certainly, life critical systems must include field studies.

An example of a Field study is beta testing.  Generally beta testing will result in only patches because at this stage of development any significant change will be too costly and push back the scheduled release date.

Methods for field evaluations:

  • Observations
  • Questionnaires

Another after the fact evaluation is laboratory studies. Laboratory studies can evaluate specific tasks in controlled environments, or observe the user through one-way mirrors while given high levels tasks to perform on the UI.  Laboratory studies can collect quantitative results useful for scientific analysis and predictive models for UI systems.

Before Implementation: Formative Evaluations

Evaluations should occur through out the development of the system, but evaluations before implementation have the most potential to change the system.  Formative evaluations can be categorized as observing users or expert evaluation.

Observing Users

Participatory design processes using prototypes can ask the users how they would use the app, so the process is observing users.  Prototypes such as UI drawings and mock-ups are presented to the customer/clients for evaluation.  The evaluations can change the design before the changes are too expensive to implement.  But the technique can be abused by both the customer and designers.  Customers can continuously request changes that do not progress the project, a form of feature creep.  More insipid are designers’ abuses by dominating the process.  Designers can tell the customers that their specific concerns are premature and the customers should think more abstractly.  Then the designers return in following workshops with designs that are drawn in concrete and say that changes are not possible.  For an iterative participatory process to work effectively it must truly be participatory and consisting of multiple feedback.

Another form of asking the users is to test prototypes on potential users. This is a more formal process than participatory design. In front of a prototype, potential users are guided through a task by designers. Typically, two designers are present. One designer is explaining task and taking notes. The other designer plays the role of the machine, producing the different components of the prototype. This technique requires that the designers do not prompt user into the correct actions, otherwise the designers will not discover usability problems.

Expert Evaluations

Expert evaluations are formative evaluations performed by designers and usability experts. Expert evaluations are effective because they take less time to prepare and do not require gathering users. Consequently expert evaluations can occur at any time during the design process.  If evaluations are made by the designers, they need to have the unusual skill of evaluating their own work.  Self evaluation requires separating ego from your work and the ability to have multiple perceptions on the design. Consequently, designers hire consultants to make evaluations. The consultants are experts evaluating the design. There are two popular types of expert reviews; Cognitive Walkthrough and Heuristic Evaluations.

Cognitive Walkthrough

Because an evaluation of one’s own work is hard and lacks perspectives of others, designers and even programmers have learned to formalize the evaluation process. Programmers use code walkthrough to find errors in their code. The programmer goes through the code line by line in front of other programmers. The other programmers try to find errors and delineate bad coding style.

Wharton [The cognitive walkthrough method: A practitioner’s guide. In Usability Inspection Methods. Wiley, 1994] and Polson [Cognitive walkthrough: a method for theory-based evaluations of user interface, International Journal of Man-Machine Studies, 36:741-773, 1992] developed cognitive walkthroughs to evaluate the “ease of learning, particularly by exploration.” During the walkthrough, the designer steps through the actions that a user performs to complete a task and the experts evaluate each step answering four questions:

  • Do users want the effect of the action?
  • Will users notice the availability of the correct action?
  • Using the correct action, will the users know it is the correct action for the effects they want?
  • After using the correct action, will the user understand the feedback?

Although the cognitive walkthrough was developed for evaluating the ease of learning, the walkthrough can also delineate other usability problems.

The standard procedure for the walkthrough:

  1. The characteristics of the typical users and sample tasks are documented. A picture or graphical description of the prototype is produced along with a sequence list of actions to perform a task.
  2. The designers and expert reviewers gather.
  3. A designer gives a short explanation of the system, describing users and sample tasks.
  4. A designer than steps through the actions, giving a plausible story or explanation for the four questions above.
  5. Experts speak up during the walkthrough noting design flows or implausible stories or explanations.
  6. A member of the design team records the design flaw or design ideas.

The designers use the results of the walkthrough to improve their design.

Spencer [The streamlined cognitive walkthrough method, working around constraints encountered in a software development company, in SIGCHI’00] working at Microsoft found that there was problems with the standard cognitive walkthrough. The benefits in terms of the number of discovered usability problems was not worth the time spent in the walkthrough, typically many hours. Also analyzing all the paper work generated by answering the four questions was time consuming. Lengthy design discussions would occur during the walkthrough and the designers would become defensive. (I have also observed these problems during our walkthrough.)

Spencer streamlined the walkthrough by reducing the number of questions to 2 questions and not allowing extended design discussions during the walkthrough. Before the walkthrough the mediator reminds everyone that the evaluation is not a design session. If the experts should engage in lengthy design suggestions then the mediator halts the discussions.

The two questions:

  1. Will the user know what to do at this step?
  2. If the user does the right thing, will they know that they have and make progress toward their goal?

If the team finds a step with an implausible story or explanation, they note the usability issue and move on to the next step. They do not redesign the UI. Also if there is a gap in the design (for example when it is not clear from the specification what action the user is to perform), the team will note the gap and move on.

The rules of behavior during the walkthrough are:

  1. No designing, but you can give design ideas.
  2. No defending a design.
  3. No debating cognitive theory.
  4. The mediator is the leader of the session.

Spencer also streamlined recording the information from a walkthrough. Rules for Recording the Information from the Walkthrough:

  1. If a particular action sequence has a plausible story or explanation then nothing is recorded.
  2. Sometimes a plausible story cannot be told because the UI assumes knowledge that the user might not have. Then evaluators record the user’s failure and lack of knowledge. For example: “Users might not click the ellipsis button to modify the list, because they might not know that they can modify the list.”
  3. Sometimes the evaluators will not have a plausible story, but have found a design flaw. The expert can just note the design flaw. For example: “Users might not know that they can modify auto generated code because it looks the same as read only code.”
  4. The evaluators may have design ideas. These design ideas should be noted and summarize. For example “DI: Automatically generate the code for the user, instead of having the user issue the command.”
  5. Sometimes the evaluators will discover that important functions do not appear in the design. “How does the user do such and such a setting?” The design gap should be recorded. If the gap is encounter again then the team agrees to hand wave and move on.

The mediator is responsible for stopping a design discussion and deciding when enough description of the design flaw has been made. Typically, if the design flaw or idea can be described as a bullet of one or two sentences then the discussion should stop. The mediator should rephrase the design flaw or idea into a bullet. This helps reach consensus and give the design team time to record the bullet.

Heuristic Evaluation

Heuristic evaluation is the most popular expert evaluation. Nielsen has demonstrated that a relatively small number of evaluators can find many of the usability problems.

A heuristic is a guideline or “rule of the thumb” that can be used to critique a design. The general idea of heuristic evaluation is that several experts independently delineate usability problems using a set of usability heuristics. The process of heuristic evaluation is:

  1. A set of heuristics are developed for the UI domains. I use the word domain to emphasize that the heuristics should be general usability guidelines, not specific to the UI. The heuristics are developed by the usability expert.
  2. The usability expert evaluates the UI. The expert will require a prototype, list of goals and task sequences so that the evaluator can understand the UI. The evaluator makes two passes through the UI.
  3. The first pass familiarizes the experts with the UI.
  4. During the second pass, the expert concentrates on the heuristics and make notes of any design flaw or failure to pass any heuristic.
  5. The evaluator writes a summary of the evaluation from the notes.

The goals of the Heuristic Evaluation are to generate a list of usability problems that violate the usability principles in the heuristic list. The items in the list of usability problems should be specific examples in the evaluated UI, for example:

  1. On the “open form”, the function of the right arrow in the upper left is not clear. This violates the visibility principle.
  2. On the “confirmation screen”, the save and delete button are adjacent to each. What if the user hit the delete button by mistake? This violates the error prevention principle.
  3. There is not a cancel button on the form for saving. This violates user control principle.
  4. etc.

The results of the heuristic evaluation are similar to those of a cognitive walkthrough, specific usability problems. General concerns of the user interface are not the immediate goals of heuristic evaluation. A review of the usability problems may reveal that many of the problems violate the same usability principle then a summary can state and illuminate the causes.

Nielsen and Molich have proposed a number of usability heuristic sets. A general purpose heuristic list that they have proposed:

  • Visibility of system status: Is appropriate feedback provided within reasonable time about the user’s act?
  • Match between system and real world: Is the language used by the gui simple and familiar to the user?
  • User control and freedom: Are there ways for users to escape from a mistake?
  • Constancy and standards: Do similar acts perform the same?
  • Error prevention: Is it easy to make mistakes?
  • Recognition rather than recall: Are objects, actions and options always visible?
  • Flexibility and efficiency of use: Are there shortcuts?
  • Aesthetic and minimalist design: Is any irrelevant information provided?
  • Help users recognize, diagnose and recover from errors: Are error messages useful?
  • Help and documentation: Is information provided that is easily searched?

Nielsen has also developed a heuristic list for web sites. He suggests remembering the acronym HOME RUN:

  • High quality content
  • Often updated
  • Minimal download time
  • Ease of use
  • Relevant to user’s need
  • Unique to the online medium
  • Net centric corporate culture

Barnum in Usability Testing and Research has several more heuristic usability lists.

The accuracy of walkthrough evaluations is proportional to the number of evaluators.  Fortunately, Nielsen demonstrated that as few as 5 evaluators found 75% of the usability problems, while using a single expert only 25% of the usability problems were found.

Resources for heuristic evaluations:

This Course

We will perform both the cognitive walkthrough and the heuristic evaluation on your initial designs.

The graduate students will perform the heuristic evaluation. The development teams will provide the graduate students with the all design documents the team have, including:

  • Brief description of the app
  • Description of users
  • usability goals for the app
  • Use scenario
  • Paper prototypes of the app, drawings
  • Usability concerns

The graduate students/UX consultants will provide development teams with:

  1. Heuristic list and usability issue list
  2. Heuristic evaluation summary

During the initial design presentation, your design team will conduct a cognitive walkthrough. We will use the procedure proposed by Spencer and generalize the use of the cognitive walkthrough to any usability issue. We will only analyze the actions and tasks to achieve the use scenario. The use scenario will be of your team’s choosing. One of your team members will conduct the walkthrough by giving a plausible story or explanation for each task to achieve the user’s goal, while displaying the paper prototype. After the presentation of the scenario, the rest of the class will evaluate your design, giving potential usability problems or short design ideas. Several members of your team will record the usability problems and design ideas. I will act as mediator for the walkthrough and limit discussion and at times summarize.