Download PDF version of this article PDF

System-class Accessibility

The architectural support for making a whole system usable by people with disabilities

Chris Fleizach and Jeffrey P. Bigham

Modern operating systems need to support extensive accessibility features that allow computer systems to be flexibly operated by people with disabilities. Content on the screen must be read aloud to someone who cannot see it. Controls must be operated by external switches or speech commands if someone cannot touch them. Aural speech must be converted to textual captions for someone who cannot hear. Apple's iOS and iPadOS, for example, includes more than 50 accessibility features that change the inputs used to control user interfaces and the outputs used to perceive them for people with different abilities.

Enabling computing systems to be operated so flexibly requires deeply technical work across the platform. This work can be categorized as follows:

Together, we call these components system-class accessibility. This approach is architected at the operating-system level to support a wide range of different accessibility features across every application that can run on the system. This reduces the specialized work required to support each new accessibility feature, and ultimately means that the platform is more accessible to more people.

This article illustrates system-class accessibility with our work enabling iPhones to be used nonvisually using the VoiceOver screen reader. We reimagined touchscreen input for nonvisual use, introducing new gestures suitable for control of a screen reader, and for output we added support for synthesized speech and refreshable braille displays (hardware devices that output tactile braille characters). We added new accessibility APIs that applications could adopt and made our user interface frameworks include them by default. Finally, we added an accessibility service to bridge between these new inputs and outputs and the applications. Because we implemented support for VoiceOver at the system level, future accessibility features that we have released since have directly leveraged this work to provide a consistent user experience.

 

Alternative Input and Output Modalities

The human interface to computing systems consists of inputs that a person provides to the system and outputs received from it. A key technical challenge in accessibility is building alternative ways to operate the system so that people can use different abilities for input and output. As an early adopter of new modalities, accessibility leads the way in expanding how computers can be used.

Many kinds of inputs and outputs have been used to operate computers, from early computing systems that used physical switches for input and lights for output, to modern desktop computers that use keyboards and mice for input and high-resolution graphical displays for output. New types of inputs and outputs are implemented using a combination of custom software and either existing hardware or new external hardware. External hardware devices generally communicate with existing computing systems using protocols like USB and Bluetooth, registering as human interface devices (HIDs), or by using specific device drivers. The refreshable braille display is a common hardware device for accessibility, which combines an alternative input (chorded braille input over six or eight keys) and a refreshable braille character output display (typically, 40 or 80 linear braille characters). Keyboards, like chords played on a piano, allow users to press multiple keys at the same time. For example, CTRL+C is a "chord" on a standard keyboard.

Adapting existing hardware to support alternative input and output mechanisms is often preferred because it allows people to use computers that are widely available. This work often happens when new computing devices are introduced. For example, when Apple introduced the iPhone, the multitouch screen initially required vision to use it because the touch input and visual output were so closely tied together—you needed to see where to touch to use the device effectively. To render the iPhone usable by people with vision disabilities, the solution was to decouple the touchscreen input from the visual content displayed on it. To do so, we introduced new interactive gestures that can be used without seeing the screen contents and used on-device speech synthesis, just like desktop screen readers, to output the content in a way that was perceivable to a person without vision.

To implement these new gestures, we first intercepted touch events and changed how they were interpreted. On iOS, touch events are generated through a firmware-level framework that turns sensor data into a stream of discrete touch, move, and lift events. We repurposed some of these events to support nonvisual use—for example, touch events were changed in software to cause content beneath the finger to be announced instead of actuated. A second touch, in succession, actuates the target. This design change allows people who are blind to explore the contents of the screen without risk of accidentally triggering targets.

Other gestures needed to be invented and implemented using multiple touch events. For example, we introduced the rotor gesture, in which a user "rotates" a virtual knob anywhere on the display to change screen reader settings (for example, the speech rate). When studying how people naturally perform the rotor gesture, we discovered differences across users. Some used their thumb and index finger; some used two thumbs close together, moving horizontally; others used two index fingers moving vertically. Thus, we built a flexible gesture interpreter to handle a variety of these cases by tracking directionality and velocity per finger over a 30-event window for the duration of a touch sequence. Triggering a rotor gesture thus required that each finger maintain the expected movement relative to one another over this period, but did not require a fixed distance between fingers or operation on a specific position of the screen.

Finally, in some cases we replaced gestures that would be difficult to perform nonvisually with a set of discrete custom accessibility actions. Dragging a target and then dropping it elsewhere on the interface requires high-dexterity movement coordinated with the screen's visual context, which is difficult to do nonvisually. Our API instead allows the appropriate function to be called in response to a sequence of discrete actions that are more easily performed nonvisually. Dragging a target can be replaced by first selecting the target, then indicating that the intended action is a drag, and then finally selecting the location where it should be dropped.

 

LISTING 1: Custom Action for Dropping a Target

override var accessibilityCustomActions:
[AccessibilityCustomAction]? {
  get {
    return [
      AccessibilityCustomAction(name: "Drop", actionHandler: {_ in performDrop() })]
  }
}

 

This iOS VoiceOver example provides an initial solution for making a device accessible, but accessibility work must also keep pace with the rest of the system. As new interactions are added in subsequent releases, accessible alternatives need to be provided. For example, when iOS introduced cut/copy/paste, we needed to create an accessible way to perform this function as well. Thus, the work of accessibility needs to continue in concert with the development of the rest of the platform.

 

APIs for Connecting to Application User Interfaces

To enable people to access applications using a variety of input and output methods, application content and interactions need to be made available programmatically. This means that applications should be able to convert their content into a computer-readable format (for example, text and metadata in a known data structure), and they should be operable using an API that is not tied directly to any particular way of providing input.

As an example, consider a command-line user interface. Input is a sequence of characters, which are generated either directly by the keyboard or provided by a computer program (for example, a shell script). Output is the sequence of characters that results from issuing a command. The accessibility API for the command-line application would thus accept a sequence of characters as input and return the sequence of characters that results. The API's abstraction means that the command-line application itself doesn't need to do anything special to support a wide range of inputs and outputs.

For input, people can use the keyboard directly, convert aural speech to text, use an on-screen keyboard driven by eye gaze, or apply any other method that can convert a user's input to a sequence of characters. For output, people can read the text on the screen, listen to the output using text-to-speech synthesis, read by touch using a connected braille display, or use any other method that is developed that can convert a sequence of characters into something that can be perceived.

Despite its simplicity, a command-line user interface also presents accessibility challenges when the output of the system is not fully represented using the API. Many command-line programs represent content visually by manipulating the character buffer to create the visual effect of, for example, a progress bar. This primarily visual representation can be difficult to understand when read aloud, and the user needs to know to go back to check the progress bar manually for updates. People are also ingenious in how they use limited modalities to achieve interesting interfaces—consider ASCII-art, the arrangement of lines of regular text characters to form the visual effect of an image.

Similar challenges arise in graphical user interfaces (GUIs) where clever developers use a variety of approaches to achieve the visual look of the interfaces they create but often do not make use of APIs available to make those interfaces accessible. Developing an accessible GUI requires the content, state, and structure of the visual interface to be made available with an API.

At the lowest level, this means exposing the content of each user interface element (for example, a button, a checkbox, or text field) with the API. A button implementation requires that the API convey both its label (such as "Login") and type ("button"). Sometimes, elements will also have associated state. A button might be disabled to indicate that it cannot currently be pressed, or a checkbox might be checked or unchecked. Finally, the API needs to capture visual structure. For example, labels need to be associated with the elements they label so that the screen reader can read them together.

An important component of system-class accessibility is thus providing the API necessary for user interface elements to make themselves accessible. In modern user interface toolkits (SwiftUI, HTML, etc.), using the standard elements generally means that developers get accessibility without additional work. The toolkit provides appropriate metadata for elements of known types. Buttons created in the standard way, for example, can provide their label and state. For a variety of reasons, developers may want to extend beyond the standard components and need the API to support making elements accessible when they do so.

Listing 2 makes a SwiftUI View into an element exposed by the accessibility API. In the original code (without the two bolded lines), the View protocol visually represents a "Pause" state using two vertical lines positioned next to each other. Adding the modifier .accessibilityElement() exposes that this is not just a visual grouping or effect, but an element that contains content. The .accessibilityLabel("Pause") modifier provides a description that conveys its content for users who are not able to see the vertical bars.

 

LISTING 2: a SwiftUI View representing a "Pause" state accessible

var body: some View {
  HStack(alignment: .center, spacing: 0.4) {
    Image("VerticalLine")
    Image("VerticalLine")
  }
  .accessibilityElement()
  .accessibilityLabel("Pause")
}

 

As Listing 2 shows, modern UI toolkits often allow user-interface components to be made accessible with minimal code. Nevertheless, accessibility gaps often still exist. Because developers aren't typically the primary users of accessibility features, it can be challenging for them to test their implementations. To help with testing and debugging, most platforms provide accessibility-testing tools. Unfortunately, these tools cannot find all problems automatically, for the same reasons they cannot just fix all the problems automatically, and so developers need to put in some effort to understand these APIs, how they work, and what it means to use them correctly.

To address the situations where developers have not implemented the accessibility APIs properly, VoiceOver users can access computer vision on demand with a feature called Screen Recognition to interpret the pixels of the GUI. This feature enables applications coded in third-party user-interface toolkits that don't expose accessibility information to be usable.1 Because each interface element has an identifiable set of pixels, the feature can label each element that VoiceOver can then use. While computer vision is a promising tool, skilled developers are needed for now to make applications fully accessible.

 

Accessibility Services

The final step to achieve system-class accessibility is to create a usable software bridge between new inputs and outputs (VoiceOver in our example) and accessibility APIs for each running application. VoiceOver needs to communicate with each application's implementation of the Accessibility APIs so that it has access to the content currently displayed onscreen, can control each application programmatically, and receives notifications from each application of important events that would impact VoiceOver's reflection of each application's state. This communication layer is the most important function of the accessibility service.

Expanding on our example: Converting a GUI into information that can be used with a screen reader requires connecting user inputs to appropriate user interface elements displayed on the screen. For iOS, VoiceOver interprets a tap on the screen as a user request to speak the interface element under the user's finger. To accomplish this, VoiceOver sends the touch coordinates to the target application, asking for the accessibility element at those coordinates. The accessibility service routes the request to the appropriate application and returns the answer to VoiceOver. The result is an accessibility object that can be further queried for more specific information such as the element's label, value, or traits.

In iOS, a low-level framework provides the accessibility service's interprocess communication (IPC) mechanism for querying apps about items. Specifically, a Mach server connects the accessibility service and each application by allowing messages and shared memory to pass between processes. (Mach is the name for IPC primitives on Darwin-based platforms, which is the Unix foundation underlying Apple platforms.) Each application registers itself as a reachable server, so the accessibility service needs to know only the process identifier (pid) of an application to message it. A messaging system allows each query to reference an element, including the attribute being queried and any parameters. On the application side, when a query is received, the element reference is decoded to point to a real object that can answer the query, assuming it correctly implements the accessibility API.

Performance is a primary consideration in accessibility services because the user can easily feel any latency in the interaction. To feel responsive, audio output should begin less than 0.4 seconds from the time a user's finger touches the screen. The sequence of events from the user's touch to the beginning of synthesized speech is the following: (1) receiving and interpreting the touch event; (2) issuing a query to the system for the current application; (3) querying the application for the element at the position of the touch; (4) querying the element for its label; and (5) finally generating and outputting the synthesized speech for the label. As a result, performance requirements are deeply connected to interaction constraints.

Beyond the IPC layer, another important role of the accessibility-services layer is to provide all other APIs and services that are helpful for enabling assistive technologies—speech recognition, speech synthesis, and accommodations that adjust behavior of the whole system, such as zoom, dynamic type, color filters, and touch.

Accessibility services are often reused for automating user interfaces, which has become increasingly popular as part of automated testing and when building agents that can complete tasks on user interfaces. Thus, automated testing and automation agents often have similar dependencies on correct accessibility implementation.

 

Conclusion

This article has introduced system-class accessibility, which is what we call the architectural support for making a whole system usable by people with disabilities. Significant technical effort goes into achieving this—dozens of assistive technologies, hundreds of settings, and numerous customizations across applications. These are the basic table stakes for what users need and expect today. This development effort ranges from low-level message passing, to hardware connectivity, to user interfaces that provide the means of controlling the underlying system. When done correctly, it provides life-changing access to technology for many people who would be left out of computing otherwise.

 

References

1. Zhang, X., et al. 2021. Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels. Machine Learning Research; https://machinelearning.apple.com/research/creating-accessibility-metadata.

 

Chris Fleizach is the Mobile Accessibility Manager at Apple and has helped ensure iPhone, iPad, Apple Watch, Apple TV, Apple Vision Pro, and more are accessible to all users. He has helped create VoiceOver, AssistiveTouch, Switch Control, made for iPhone (MFi) Hearing Aid support, Assistive Access, and more with many people and teams across Apple.

Jeffrey P. Bigham is the Director of Human-Centered Machine Learning at Apple and an Associate Professor in the Human-Computer Interaction and Language Technologies Institutes in the School of Computer Science at Carnegie Mellon University. His research and product work have brought machine learning to bear on hard problems in accessibility.

Copyright © 2024 held by owner/author. Publication rights licensed to ACM.

 

acmqueue

Originally published in Queue vol. 22, no. 5
Comment on this article in the ACM Digital Library





More related articles:

Vinnie Donati - Driving Organizational Accessibility
In this article we'll explore how Microsoft drives accessibility throughout its organization and we'll look closely at essential frameworks and practices that promote an inclusive culture. Through examining aspects like awareness building, strategic development, accessibility maturity modeling, and more, we aim to offer a guide for organizations starting their accessibility journey. The idea is to share what we've learned in the hope that you can take it, tweak it to fit your company's purpose, and nurture accessibility in a way that's not just a checkbox activity but genuinely integrated into your culture.


Shahtab Wahid - Design Systems Are Accessibility Delivery Vehicles
Design systems are infrastructure built for consumers—the designers and developers—working on applications. A successful one allows consumers in an organization to quickly scale design and development across applications, increase productivity, and establish consistency. Many consumers, however, are not prepared to build for accessibility. Couldn't an organization make building accessibility support for applications scalable, productive, and consistent? This article explores how a design system becomes an important vehicle to supporting accessibility.


Juanami Spencer - Accessibility Considerations for Mobile Applications
Considering accessibility is essential when creating mobile applications to ensure they are usable and enjoyable for as broad an audience as possible. Mobile accessibility has unique considerations compared with desktop experiences, but it provides immense value to those users who rely on mobile devices in their day-to-day activities. By keeping these considerations in mind, mobile product development teams can better support and enhance the lives of all users. This article explores some of the key accessibility considerations for a mobile application and highlights a few ways the Bloomberg Connects app supports accessibility in both the product and process.


Stacy M. Branham, Shahtab Wahid, Sheri Byrne-Haber, Jamal Mazrui, Carlos Muncharaz, Carl Myhill - The State of Digital Accessibility
If you are new to digital accessibility, and even if you are not, it can be difficult to stay abreast of the big picture, and the tech industry moves fast. So, we asked a team of experts to bring us up to speed. Not only do they have day jobs that involve digital accessibility, but they also have lived experience of disability. We posed the following questions to them: What's the state of accessibility? Key challenges? Why do we need accessible software? How can we make the case for accessibility? Who's leading the way? Where do we go from here?





© ACM, Inc. All Rights Reserved.