Models for user-interface interactions must simulate both the user and the interface. Broadly there two types of models, generative and GOMS like models. Generative model, like EPIC or ACT-R, do not need a precise description of how the user will behave using the interface, rather they determine from a set of rules how the user will behave in general. (The rules are called production rules.) GOMS like models need a precise description of how the user will behave. Typically the description is determined from a Hierarchical Task Analysis. GOMS models assume that the user is an expert in the interface, meaning that the user does not make mistakes and does not have to search for the proper action to perform.
Models are useful for evaluating alternative design without usability testing. They are cheaper and quick to perform than usability test, but an expert is required to construct the model.
GOMS (Goals, Operators, Methods, Selection)
Card, Moran, and Newell developed GOMS in the 1980. They developed two versions:
CMN-GOMS: the original GOMS, “The Psychology of Human-Computer Interaction,” Lawrence Erlbaum Ass. Publishers, 1983. It is a top level task analysis. You can find a lecture GOMS-CMN in the old lecture.
KLM-GOMS: A key stroke level model, this lecture. This is the simplest GOMS.
Other GOMS models are:
Critical-Path Model GOMS (CPM-GOMS) or Cognitive Perceptual Motor GOMS (Yes the acronym means either name). Developed by Bonnie John and eliminates the restriction that actions are performed sequentially.
Natural GOMS language (NGOMSL) was developed by David Kieras and formalized the description of tasks and enable automated calculation of task execution time and learning time.
Keystroke-level Model GOMS (KLM-GOMS)
Card, Moran, and Newell (The Keystroke-level Model for User Performance with Interactive Systems, Communications of the ACM, 23:396-410, 1980) measured the time for users to perform a series of gestures on the computer. They discovered a fundamental principle:
The total time to perform a sequence of gestures is the sum on the individual gestures.
A lot is implied in this statement. The most important is that there are fundamental gestures. Individual users perform the fundamental gestures in different times; the researchers attempted to determine typical values:
K = 0.2 sec Keying: The time to perform a keystroke, or mouse click
P = 1.1 sec Pointing: The time to position the mouse pointer
H = 0.4 sec Homing: The time for user to move hands from keyboard to mouse
M = 1.35 sec Mental: The time for the user to prepare for the next step
R = ? Responding: The time for the computer to respond to the user inputs.
The variation of the timings across users can be as much as 100%, for example an expert typist can type 200 words per minute = 0.06 sec (Note that the measurement assumes 5 characters/words). So the model cannot accurately predicate the response time of an individual user. Chris Blazek and I have measured these variables for a web user and they are surprisingly accurate. Even without precise gesture times for a specific user, the model can be used to determine times for expert users and compare across interfaces.
We calculate the total response time by listing the individual gesture and summing their individual execution time. The difficult part is determining where a mental preparation, M, occurs. The researchers determined heuristics rules for placing mental operations:
Rule 0: Initial insertion of candidate Ms: Insert M before all Ks and Ps
Rule 1: Deletion of anticipated Ms: If P or K is fully anticipated by a preceding P or K then delete the middle M. For example moving the mouse to tap on the button; PMK => PK
Rule 2: Deletion of Ms in cognitive units: If a series of Ks represent a string then delete the middle Ks; for example type ‘1.2’ is a cognitive unit; MKMKMK => MKKK
Rule 3: Deletion of Ms before consecutive terminators: If several delimiters are typed only keep the first M. For example if ‘))’ is the terminator, use only one M.
Rule 4: Deletion of Ms that are terminators of commands: If the terminator is a frequently used, delete the M before the terminator; for example a command followed by “return,” so the M before the K representing the “return” is deleted. But if the terminator delimits arguments for a command string that vary then keep the M. This represents checking that the arguments are correct.
Rule 5: Deletion of overlapped Ms: Do not count any portion of an M that overlaps with a command response. (This is the reason that a responsive interface only needs to respond in a second.)
Example: Unit conversions
We want to design an interface to convert centimeters to inches and vice a versa.
This is a simple problem, but Raskin develops 4 designs.
Design 1: Dialog box
Interface consists of two radial buttons to choose between centimeters or inches and two text fields; one text field to type the 4 characters for distance and the other to display the result.
Sequence of Tasks:
- Move hand to mouse
- Move mouse to units radio button
- Click on units
- Move hand to keyboard
- Type 3 characters
- Type enter
The sequence without Ms: HPKHKKKKK
Using Rule 0: HMPMKHMKMKMKMKMK
Using Rule 1, we should delete the M between P and K because the user moves the mouse in order to click, the gesture is anticipated: HMPKHMKMKMKMKMK
Using Rule 2, we should delete the Ms between the Ks of the character for the distance, they form a single cognitive unit: HMPKHMKKKKMK
Using Rule 4; keep the last M before typing enter.
Rules 3 and 5 do not apply.
Now we calculate: H+M+P+K+H+M+K+K+K+K+M = 0.4+1.35+1.1+0.2+0.4+1.35+0.2+0.2+0.2+0.2+1.35+0.2 = 7.15 seconds.
Is the time approximately correct? If the units are already selected and the window has focus then the gesture sequence is MKKKKMK = 3.7 seconds. We propose this happens half the time so the average time is 5.4 seconds.
How long would a fancier GUI take to use?
Design 2: A GUI
The GUI has two measuring sticks with pointers one in inches other in centimeters. When the user moves the pointer on one stick to the correct distance the other pointer correctly points to the corresponding number in the converted unit. There is only room on the sticks to display an order of magnitude, in other words the user must use buttons to expand or compress the scale. So if the distance that the user wants to convert is not on the screen, the users must first expand the scale, move the scale and then compress the scale to refine the distance.
Our first analysis assumes that the distance that the user wishes to convert is displayed on the yard stick so that the user does not have to expand and compress the scales.
Sequence of gestures:
- Move hand to mouse
- Move mouse to pointer
- Click mouse
- Move pointer, dragging the pointer to a new distance
- Release pointer
So the sequence is: HPKPK
Add proper Ms, using rule 1: HMPKPK
Note that the user anticipated dragging the pointer after clicking on it.
Calculate 0.4+1.35+1.1+0.2+1.1+0.2 = 4.35
The calculation is only valid if the yard stick is at the correct order of magnitude. In the worst case the scale could be two orders of magnitudes away requiring the user to move the mouse and click the compress button trice, locating the pointer on the distance. Then the user would have to trice expands the scale. The sequence is:
|Total Execution Time
A lot longer, no wonder GUI take so long to use.
Two goals of a UI are to receive information from the user and display information to the user. This implies an exchange of information between the user and the machine. The study of communication includes Information Theory, which defines the quantity of information that is transmitted in a communication channel. Shannon introduced formal information theory in 1948 (The Mathematical Theory of Communications, Bell System Technical Journal) during the development of the radio communication and computers. His goal was to determine the best form for information transmission in a noisy environment. Shannon’s initial insight is that the bit represents the fundamental measure of information. For example if the user can choose among 4 radial buttons; A, B, C, and D then the user’s choice can be represented with two bits. We can encode the user response:
- A = 00
- B = 01
- C = 10
- D = 11
The total information in this dialog is 2 bits. This can be naturally extended to any number of choices elicited from the user or presented to the user. If the user selects with equal probability n buttons then the number of bits representing the choice and the total information of the dialog is:
Similarly if the computer can respond equally with n different responses then the total information available to display is lg(n). The typical problem of information theory is sending information to a receiver from the transmitter; the analogy is very close to the human and computer communicating through a GUI.
The user clicks on one of the buttons then the information of any single choice from the user is:
assuming equally probable that the user chooses any one of the button.
Choices are not always equal, for example during typing of text, letters in a word occur with different frequency. Then the information in a message is:
where pi is the probability of sending message i or choosing button i. The sum of all the probabilities is of course equal to 1, meaning
Σall i [pi]= 1
The total information available in the communication channel or UI is:
Σall i [pi lg(1/pi)]
If the probability of the choices are equal then pi = 1/n, so for equal choices the above sum reduces to lg(n). Our GUI examples will assume equal choice of buttons, but you should know that if the choices are not equal then:
Σall i [pi lg(1/pi)] ≤ lg(n)
In other word, the total available information of a dialog with unequal probability of responses is less then a dialog with equal probability.
We can already learn something practical for gui design; if the user can only respond with pressing a single button (for example the error dialog box with an OK button) then the amount of information transmitted by pressing the button is nil. This is because the probability of the user selecting the button is 1 so the information communicated is:
1*lg(1) = 0
Use of single button dialog boxes (used by MS) generally represents bad design, no information is communicated, the user might as well not respond. Is this exactly true? The designers of GUI should find a better solution for error handling. The information is different if after a lack of response, the computer acts differently. Note that most linux applications and compilers response to an error is only printing the message not eliciting a response from the user.
We can define the information efficiency of the dialog as the minimum information necessary to communicate the message divided by the amount of information actually communicated. Information efficiency is like any other efficiency measure and can vary from 1 to 0. Naturally GUI designers aspire to information efficiency of 1. Information efficiency less then 1 represents implies that there is redundant information in the message. Redundant information does serve a purpose, if you are communicating over a noisy channel then the redundancy can be used to reconstruct or insure proper transmission; for example check sums. In GUI design redundant information can be used to make certain of the user’s intent, but more likely the redundancy occurs because of syntax.
We can calculate the information content of our example for distance conversion. The user inputs numbers from 0 to 9999 so there are 10,000 possible responses. Assuming that the numbers inputted by the user have equal probability (i.e. random) then the total information in a single message is:
lg(10,000) = 14 bits
This represents the minimum number of bits required to express the user’s choice of distance. To determine the information efficiency we should calculate the number of bits the GUI requires from the user and divide into the above.
Instead of making the calculations in binary, it is more convenient to make the calculations in terms of keystrokes. Raskin calls this character efficiency, and he says it is an approximation. I believe if calculated properly the efficiencies are equivalent. Using characters as our bases, the user must input 4 digits. To claim that each character has equal probability the interface should have only 10 keys. In other words if the user is entering from a standard key board then each character does not have equal probability. Many characters are not used (i.e. all the letters). We assume that the user is using the numerical key board and forget about the operation keys (i.e. / , * – , + etc.).
If the user must also input at least an additional keystroke to represent the units, then the efficiency is 4/5 = 80%.
User Time Efficiency
If want to minimize the user’s time, we should compare the times of the GUI designs with the minimum time. The user must always think before making the input, so the minimum time is:
MKKKK = 1.35 + 0.2 + 0.2 + 0.2 + 0.2= 2.15 sec
Compare with the time required for the dialog design, 5 sec. The GUI takes much longer than minimum. Can we do better?
Design 3: Message box request for information
A message box appears asking the user to input the units (cm/inches), the distance (4 digits) followed by return/enter.
Then the keystroke sequence is MK KKKK MK = 3.9 sec.
That is an improvement over the dialog design, but still short of the ideal design.
The studious reader will declare that we could eliminate a key stroke by replacing the enter keystroke with the units then the keystrokes are MKKKKMK = 3.7. But this is still longer the minimum time.
Another student will say but maybe we must input all four character; the units are a necessary part of the message and the user will have to think about it before entering the units. If I give you a design that does not require all four keystrokes then the last design is not ideal.
Design 4: Bifurcated output
A box appears with text field for entering 3 digits and automatically there appears two output text fields, one text field conversion in centimeter the other in inches.
The required keystrokes are MKKKK = 2.15 sec. IDEAL.
With respect to minimum user time the solution is ideal, but it may not be the solution we prefer. There is more to design then minimum user time and information efficiency.
CogTool is an open source prototyping tool that automates a sophisticated version of KLM-GOMS.
As a prototyping tool, designers can create Frames that have functional Widgets. The functions of the widgets are expressed by Transitions which describe the operations on widgets and the transition to different frames or back to the same frame. Using a CogTool, designers can create Tasks by sequencing transitions as in a story board and compute the time for tasks. The task time computation uses ACT-R, so it is more sophisticated then hand calculations we performed in the prior two examples. Some task can be performed simultaneously and mental processes are automatically inserted.
Portotyping: Frames and Widgets
CogTool can prototype alternative Designs for multiple Devices.
Devices are defined by input and output modes. CogTool implements keyboard, mouse, touchscreen and microphone inputs and outputs on displays and speakers.
So CogTool can prototype:
- Desktop Applications
- Mobile Devices
- and more
A design of a UI is defined by multiple frames and transitions that link them. Frames are different views that the UI may present to users. The Design Panel in CogPanel contains multiple frames. The designer can add widgets to the frames in the Frame Panels. Besides custom widgets, CogTool has predefined widgets:
- Text Box
- Context Item
- Pull-Down (Drop-down) Lists
- Non-Interactive (used for “look at”)
The prototyping tool allows the designer to draw the widgets directly into the Frame Panels. Images from screen shoots of actual UI or designs from other prototyping tools can be imported as background to the frame panels. The semi-transparent CogTool widgets can be drawn over the background images. This correctly sizes and locates the widgets within the design. This is the prefer method rather than drawing the design directly into the frame. Even for developing new designs, I recommend using a different prototyping tool to draw the basic UI because CogTool does not have tools to properly locate or render widgets. The power of CogTool is the automation of KLM-GOMS calculation and not for designing the prototype. The technique of drawing widgets on top of screen shots works well for basic widgets such as buttons, links and text boxes, but because of the automation of CogTool’s Menus, designs cannot import menu hierarchy screen shoots easily. The menu widget does allow designers to easily and quickly construct multiple level menus with simple text labeled items.
Transitions are defined by in the Design panel of CogTools. The designer selects a widget, drags the cursor to the subsequent frame and defines the transition properties. Properties define the input device and the action of the device. For example clicking on a button, the designer would select mouse type, the mouse button, the action (click, double click etc.) and modifiers (shift, ctrl, alt, etc.). For typing in a text box, the design would select keyboard type, and the characters typed.
Transitions can be “self-terminating,” meaning returning to the same frame. This is useful for either expressing complex actions such as selecting text or drag-drop interactions. It is also useful for quickly prototyping tasks that do not have appreciable change in the view.
Tasks are defined from the Project panel. The project panel is a grid that allows comparing tasks and designs. Columns are different designs (set of frames) and tasks are rows. Clicking on the cell at the intersection of the task and design opens a dialogue for selecting the start frame and other initial parameters such mouse hand and starting location of the hands. The Scrip panel then allows the designer to select a transition with widget sources in the frame. Selecting transitions with destinations into another frame causes the other frame to appear. So the task is simply defined by clicking on one transition to the next. CogTool is particularly well built; the designer can edit transition and add new transitions as the designer composes the task.
After the task is defined, determining the predicted time to perform the task is just right clicking on the corresponding cell in the project panel. On even fast computers the calculation takes several seconds for simple tasks. The Act-R engine is making the calculations.
A particular nice feature of CogTool is viewing the Act-R model. This is accessed by right clicking on the calculation in the project panel and selecting “Show Model Visualization.” The visualization shows the time line of the cognitive and physical process. Comparison can be made between designs and tasks. I believe this is a good tool for determining were the bulk of time is spent and improving the design.
A tutorial is available at:
Click on the pdf and ask for raw view.
To follow along with the tutorial you will need images/screen shots of the palm pilot. These are available at:
Click on tutorial-images.zip and ask for raw view.
The link below is to the project files that I made for comparing different methods of deleting and saving a word document. Note that the frames screen shots of Word UI and the document.
When a saved project is first opened, only the project window shows. The only way to view the design widow is by double clicking quickly the column header for the design. There are no menu selections that can show the design. This is one of only two “design” flaws with CogTool. In effect the interaction is hinted and can frustrate the novice users. When this first happened to me, I thought that I had lost the designs or they were missed placed. I had to reread the manual to discover how to view the designs. The other design flaw is after a widget is drawn on the frame, it is not possible to change the widget type. Instead the user must delete the widget and redraw it using the correct tool.
Another difficulty is viewing scripts (sequence of transitions in a task). This can only be achieved by right clicking the cell at the intersection of the task and design.