CHI+MED logo banner

Programming safer keyed data input

Key points

  • We have created a tool that automatically creates data entry user interfaces that are safer to use than those created in ad hoc ways. It helps ensure that programmers cover all eventualities and the user interface responds to errors in a sensible way.
  • User interfaces cannot be made perfectly safe. This is a fundamental problem we define and call divergence. It means that whatever is done to prevent users making mistakes there will always be some other mistake that can still be made.
  • Programmers have at best ad hoc solutions over what to do when people enter data incorrectly, like when they include a second decimal point or put a leading 0 in a number, but often programmers just ignore these seemingly simple problems. Unfortunately, failing to manage user error well causes even worse problems for users, and this is what our tool helps prevent.
  • One approach we developed that the tool uses is traffic lights, so the user interface and the user know when a mistake has been made. A red traffic light — perhaps augmented with a sound or vibration (and it could be any colour, or flash for colour blind users) — alerts the user that a problem must be solved. With the tool, traffic lights are implemented flexibly, efficiently and dependably. Crucially, the programmer can then build software that does not need to handle the user errors, since the tool has done that already.

Nurses, doctors and patients using interactive systems like infusion pumps and diabetes monitors, need them to be predictable if they are to be easy to use. If a nurse is trying to correctly enter a drug dose, then they need to know what sequence of key presses will achieve it. They also need to know how to interpret messages, lights and sounds from a device to know they have achieved correctly entered the dose. Furthermore, if an error happens, nurses or other operators need to recognize this and be able to recover appropriately.

Unfortunately, programming a user interface to accept data typed in by a user is hard, but seems trivial to programmers. It is therefore rarely done with enough thought to cover all the situations that may arise, but it must be if it is to be safe. For example, it is ‘easy’ to write a program that reads numbers like 3.5. But writing a program to detect all possible errors that might be made, like someone accidentally keying two decimal points, as in 3..50, is much harder. Few medical devices even notice this error! The programmer should also decide how the program will deal with the error, which is even harder. Other problems the programmer must deal with include the user keying more than will fit in the display, entering numbers that are numerically too large or too small, and so on. Finally, all the different ways of handling errors has to be presented in the user interface in a consistent way so the user can understand what to do to recover.

Special languages
A common way to improve dependability is to use a rigorous mathematical language to specify what a program should do. This means that the specification is precise, and ideally can even be proven to be correct. One way of doing this is to use a mathematical language called regular expressions. These are a way of rigorously defining the grammar of a special kind of language called a ‘regular language’. Just as books of grammar define languages like English or Spanish, a regular expression defines the order that letters, punctuation and words can appear to create meaningful sequences. Rather than setting out a human language, in our case the language describes things like the sequences of buttons that can be pressed to do meaningful things, such as entering correct drug doses on an infusion pump. The words of the language stand for the buttons that can be pressed and a ‘sentence’ of the language stands for a meaningful sequence of such actions, such as 50 or 50mg.

A regular expression language is not actually powerful enough to describe English grammar, but it is good enough to describe many tasks humans do, such as shampooing the dog and washing up the dishes. It is also good enough for numbers. For example, [0-9] is a regular expression in one of the world's most popular languages, JavaScript, here describing a number as a single digit chosen from 0 to 9. A ‘*’ symbol means ‘repeat any number of times’, so [0-9]* describes a number made up of a series of digits of any length. Note quite: the regular expression [0-9]* also describes nothing — so a correct definition of a number made up of a series of digits would better be written [0-9][0-9]*

In general, a regular expression can be thought of as a simple program that describes what is allowed. It describes what sequences of button presses are possible to enter a legal number, for example. If what is typed in does not match the regular expression then a mistake has been made, and so should be managed.

Unfortunately, regular expressions are usually programmed in a way that itself is obscure! For example, ^[0-9]*[0-9](\.[0-9]*[0-9])?$ is a basic regular expression for a decimal number with an optional decimal point. It isn't very readable, and maybe there are subtle errors that the programmer will miss, which would defeat the whole point.

Symbols like ‘-’ have a special meaning in a regular expression: as in giving a range as in [0-9] above. But a device being described might have a button marked ‘-’ too. If the programmer wants to use ‘-’ to describe the button being pressed, the regular expression must therefore become more complex and unreadable. An example in the regular expression just mentioned is the backslash (\) in the middle before the decimal point; the slash is required because without it, the dot would mean something else in the regular expression. Worse, in a language like JavaScript, backslashes themselves are special characters, so many programmers get confused whether to write . or \. or \\. or even \\\.!

The huge range of features that are typically squeezed into regular expression languages means that almost anything is valid, and this can lead to more programmer errors. While it may make it easier to get programs to ‘start to work’, none of it is a good foundation for dependable user interfaces.

Even when a regular expression is correctly used to describe allowable numbers, the numbers accepted are usually specified independently of the actual user interface. Thus programmers fail to take into account things like the size of the display. Perhaps only 5 digits can be shown. If the user enters a larger number than the screen can show it will not be clear what the device is taking as “the” number. It could be what the user can see, what they remember keying, or something else entirely. The programer will not even be aware there is any uncertainty.

Writing clear regular expressions
In theory, regular expressions are well defined and need suffer none of the problems above. Programmers find regular expressions easy to use. They also find them easy to translate into a kind of diagram called a finite state machine, which are much easier to understand and extremely easy to program correctly. For example the following is an automatically generated variation on a finite state machine diagram, called a syntax diagram, showing a number with an optional decimal point. Follow any path through the diagram and you will get a correctly formed number.


Compare this with the unreadability of the conventional regular expression shown above, which, unlike this diagram, allows numbers with unnecessary and perhaps confusing leading and trailing zeros — had you noticed that from the obscure regular expression?

Rather than an obscure regular expression (like the one shown in the text), the programmer can literally see that this does what they want, just by tracing the "railway lines" around the diagram. Thus, properly managed, regular expressions used with syntax diagrams are an ideal way of building safer user interfaces.

A tool to help create safe interfaces
We have built a comprehensive demonstration tool, that given a regular expression generates a user interface that behaves in the way that it specifies. It first reads a regular expression in a specially-developed, clear language with none of the problems discussed above. The language allows programmers to document and so remember what they meant the program to do. It also has a precise and ‘tight’ syntax designed so that many programmer errors can be detected, and it has ways that expressions and concepts (like the [0-9] above) can be named and managed by the programmer, or even taken from libraries of pre-defined and tested solutions. It also automatically creates syntax diagrams from the regular expressions. The diagram above was generated by the tool, and is one way it helps the programmer be certain that they are implementing what they really want to implement.

Once a regular expression has been defined, the tool then takes several extra pieces of information, like the maximum number of keystrokes to allow. This information is used to guide how the user interface is generated. Default values can also be given. For example on a medical device, if no new dose is provided by the user, the default number might be the previous dose entered. The tool then generates a rigorously-specified user interface that is tightly integrated with user interface widgets. For example if a number or name is to be entered into a text field of a certain size (say, 8 characters wide), the tool ensures that a user entering more text than will fit is correctly and consistently handled.

A regular expression engine can easily be added by the tool to any program, replacing its original ad hoc user interface code. Doing so would make existing user interfaces more dependable and upwards compatible. In addition the tool also introduces traffic lights: typically, colours on the display that indicate potential errors. The colours can be amber, green and red like traffic lights. If the user has not yet finished entering their data, the display will be coloured amber; if they have finished, it is green; if they have made an error that must be corrected, it is red (and perhaps beeping and flashing). Regular expressions ensure this colour coding can be reliably implemented without any work from the application programmer. The approach reduces user error.

The theoretical basis of regular expressions allows us to prove that some possible user errors cannot be managed; that is, however they are managed, something else can still go wrong. We call this divergence. The important thing is to recognise it, and try to minimise its impact on the user's task. Our tool provides a way to do this.

The tool is available on the web at http://harold.thimbleby.net/regex

Key people
Andy Gimblett, Harold Thimbleby

H. Thimbleby & A. Gimblett, “Dependable Keyed Data Entry for Interactive Systems, in Proceedings FMIS 2011, 4th International Workshop on Formal Methods for Interactive Systems, Electronic Communications of the EASST, 45:1/16–16/16, 2011. doi 10.1145/1996461.1996497

H. Thimbleby, Interactive Numbers — A Grand Challenge, Proceedings of IHCI 2011: International Conference Interfaces and Human Computer Interaction, ed. K. Blashki, pp.xxviii–xxxv, Rome, Italy, 2011.

P. Cairns, H. Thimbleby, Reducing Number Entry Errors: Solving a Widespread, Serious Problem, Journal Royal Society Interface, 7(51):1429–1439, 2010. doi 10.1098/rsif.2010.0112.