Download PDF version of this article PDF

Elevating Security with Arm CCA

Attestation and verification are integral to adopting confidential computing.

Charles Garcia-Tobin and Mark Knight

In some significant markets, such as defense, government,1 and banking,9 private data centers and on-premises computing remain widespread. While shared computing in the public cloud can bring cost and environmental benefits arising from economies of scale, it is traditionally associated with a loss of some control when compared with in-house corporate infrastructure. Public cloud data centers are managed by CSPs (cloud service providers), and servers and their administrators may reside in different legal jurisdictions from those of the customers renting the service.

A desire to move increasingly sensitive workloads to the cloud has driven the need for confidential computing.5,8 This is a model for computing where a workload can be deployed on third-party infrastructure with high confidence that no third party can compromise the workload's confidentiality or integrity. In addition to facilitating new cloud-based workloads, confidential computing represents an elevation of security for existing cloud workloads. In the first instance, confidential computing is typically an opt-in service, but in time, it's likely to become the default with major CSPs.

Tenants and cloud operators must work together to construct a model of operation that supports confidential computing. Confidential computing's main functional benefit is that it protects data and code from certain security threats while it's processed. It is typically implemented as a hardware-based isolation technology that creates a secure moat around a computing workload, such as a virtual machine, where most computing workloads require data to be present in an unencrypted form to allow processing. The technologies that support confidential computing in a cloud environment can also be used to strengthen security in other computing environments.

 

Supporting Confidential Computing

Implementations of confidential computing seek to elevate the security that's already available in time-sharing computing systems. Traditionally, computing hardware provides mechanisms that are used to provide secure isolation between workloads. In addition, the hardware provides a system of privilege levels that ensures that the ability to configure the computing platform and access resources is restricted to supervisory code, such as operating-system and hypervisor kernels. Operating systems then typically provide additional partitioning by identifying and enforcing a system of permissions that restrict the rights of different system users.

With this combination of hardware and software control, exposure to supervisory code interfaces, and therefore the ability to influence or control resource access and configuration, is allowed only for authorized system services and administrators. Highly privileged system users, however, may not be subject to such restrictions and frequently have unrestricted access to resources such as memory, along with the ability to replace system programs.

Confidential computing removes the ability of supervisory code—and by inference, system administrators—to read or modify customer workloads, even while they are running. However, it still allows that code to manage the platform and workloads—for example, to decide how much resource it needs in terms of memory, processing time, or access to devices.

In a platform that supports confidential computing, customer workloads are protected by a mixture of hardware and low-level firmware. The services provided by these components must be small and verifiable, and both must be trusted. This combination of firmware and hardware forms a TCB (trusted computing base).

Building a platform that supports confidential computing requires careful systemwide design. Platform protections must extend beyond processing elements to other subsystems, including system caches, memory, I/O, and trusted peripherals such as accelerators. Any hardware that will have access to sensitive workloads must honor the commitment to confidential computing. For example, in a system that supports confidential computing, memory controllers typically encrypt data before it is written to main system memory to mitigate the threat of reboot attacks. This or other mitigations would be needed to ensure that the residual content of a workload cannot be read by another workload after an unexpected hardware reset.

 

Attestation of Confidential Computing

The best security technologies are often virtually transparent to users while creating an insurmountable barrier to attackers. An implementation of confidential computing may be all but invisible to a user, but relying parties still need a way to remotely assess the security claims that a platform makes. Attackers could attempt to replace a TCB, or they might try to replace the entire platform with a hacked clone. These threats need to be detected before data or code can be compromised.

To allow relying parties to verify the integrity of a platform, implementations of confidential computing support a process of attestation and verification. Verification provides a means for relying parties to remotely check a platform's properties at runtime before a sensitive workload is loaded.

Attestation is a multistage process (illustrated in figure 1):

Elevating Security with Arm CCA

Stage 1. The creators of a trusted platform must digitally endorse each trusted hardware and software component and the associated security claims. Typically, this requires the use of cryptography and digital signatures that chain back to a trusted CA (certificate authority). The endorsement of a platform may include third-party analysis, review, and countersignature so that platform security claims are tested.6

Stage 2. A workload such as a virtual machine must be able to obtain attestation reports for the platform it's loaded on, and for itself. These reports must contain detailed information about the platform origin and configuration, and must be cryptographically signed with a secure on-platform root of trust. The reports also contain measurements of the workload itself, allowing the relying party to verify that the workload is executing the expected code and data. A workload can then present the attestation report to a relying party.

Stage 3. Any entity in possession of the attestation report is then able to verify the integrity of the platform and workload that is being attested, usually in concert with a trusted verification service. In a complex network of heterogenous devices, a federated model of verification is likely to be necessary so that relying parties can avoid building their own relationships with every platform supplier within their network.

Steps 2 and 3 must occur on demand and usually within a few milliseconds. If attestation verification fails, workloads and relying parties can reject the platform and halt execution before any data is compromised.

A workload would typically present attestation reports to a key management service as part of its startup sequence, ensuring that sensitive data can be decrypted only when the key management service is satisfied that the workload is running on a secure platform. Third parties can use protocols to obtain attestation reports as part of a secure network transport such as TLS (Transport Layer Security)11 and reject connections if attestation verification fails.

Platform attestation must account for every part of the TCB and can be implemented independently of confidential computing. On platforms that don't support confidential computing, the integrity of a platform can be verified with attestation, but relying parties can be less confident that privileged software on the platform cannot access the workload. Factors governing whether confidential computing is required, or platform attestation is sufficient, include the threat model and the practical availability of hardware systems that support these features.

Emerging technologies tend to diffuse across markets gradually. Technology supporting confidential computing is initially being deployed on servers in cloud computing markets, and is likely to spread to additional markets and form factors. There may still be value in performing mutual platform attestation, even if only one endpoint supports confidential computing.

An Example Use Case for Confidential Computing

Many complete computing systems are distributed across several independent computers. Consider a simple online banking service that consists of a cloud-connected transaction server and mobile app that is the primary client interface for users. A server would typically present a web API to the client, and this is protected in use by TLS and strong user authentication. Potential attack surfaces include the app UI and the server web API. Repeat authentication is often partially delegated to the client device. For example, authentication keys are typically stored on each user's device after successful initial authentication. Client-side biometric authentication may then be used to unlock these keys to allow new user sessions to be started without manually supplying credentials each time the app is run.

Security architects consider various potential attacks, including:

1. Can a user's mobile device be relied upon to store authentication keys securely and allow them to be used solely by the genuine banking app only after biometric authentication?

2. Can an attacker spoof a genuine client app and successfully call the web API directly, potentially bypassing client-side security controls or exploiting vulnerabilities in the web API?

3. Can a server be stolen or cloned, and a client be misdirected to a fake server that mounts a "man-in-the-middle" attack?

4. Can the server be attacked by those with physical or administrative access, allowing authenticated sessions to be hijacked and potentially unauthorized transactions to be injected?

Strong mutual attestation at the start of each session can help mitigate these attacks.

1. Attestation can provide evidence that a mobile device is genuine, unmodified, and configured in a secure way. When the platform supports confidential computing, relying parties can be confident that the phone's operating system has no access to sensitive financial services data.

2. Attempts to call the web API with a third-party client can be blocked if attestation and verification of the client app and mobile device fails.

3. Attestation of the server can prevent the server API from being offered on untrusted platforms and can enable clients to reject connections to unauthorized servers.

4. Server security claims can be verified by the client, providing a high level of confidence that customer data remains secure.

As this example shows, attestation techniques can provide increased confidence when deploying critical services in a third-party data center; the same techniques can also mitigate risks when relying on mobile apps as part of a distributed computing system. While confidential computing is most naturally sought when the owner and user of a platform are different, it can still reduce the amount of platform code that has access to user data, and therefore the attack surface that could be exploited, even when the owner and user of the platform are the same.

 

Arm's Realm Management Extension

The Armv9-A architecture is widely deployed across a range of computing form factors, such as smartphones, laptops, servers, and many IoT (Internet of Things) devices. Armv9-A has recently been enhanced with RME (Realm Management Extension), which provides an architecture that is designed to protect code and data using the techniques of confidential computing on different form factors.

Prior to the introduction of RME, TrustZone and virtualization were the pillars of secure compute. TrustZone divides the compute into a Secure world, for running trusted applications and trusted operating systems, and a Normal world, for running standard applications and operating systems. Secure-world software has access to the Secure physical address space, which cannot be accessed by the Normal world. This isolation protects the trusted applications and trusted operating system. Typically, the memory for the Secure physical address space is carved out statically at boot time.

TrustZone works well for platform security use cases, which are limited in number. Confidential computing, however, aims to allow any third-party developer to protect its VM or application. Therefore, it must be possible at runtime to protect any memory associated with a VM or application, without limits or carve-outs. In addition, it is still important to support TrustZone for platform security.

 

RME hardware

RME introduces a new kind of confidential compute environment called a realm. Any code or data belonging to a realm, whether in memory or in registers, cannot be accessed or modified by code or agents outside of the TCB:

• The supervisory software that created the realm—that is, the kernel or hypervisor in the Normal world.

• Code executing within TrustZone.

• Other realms.

• Devices not trusted by the realm.

Attempts by those entities to access a realm's code, data, or register state are blocked and result in a faulting exception.

Realms that are run inside a newly introduced Realm world, and memory at runtime can be moved between Normal and Realm worlds, or even between Normal and Secure worlds. This is achieved through a new data structure that has been added to the architecture: GPT (Granule Protection Table). This structure tracks whether a page is to be used for Realm, Secure, or Normal worlds. The hardware checks this table on every access and enforces isolation between worlds, blocking those accesses that are illegal, such as one from a hypervisor to a Realm world page. Within a world, translation tables provide further isolation; this is how realms are isolated from each other. A hypervisor or kernel can indirectly update the GPT, allowing pages to migrate between Normal world use and Realm use, or even between Normal world use and TrustZone use. Memory is also encrypted and scrubbed to ensure its contents cannot be accessed by successive users.

The GPT is checked by the hardware at the end of a page-table walk. This is referred to in the Armv9-A architecture as granule protection checks. If these checks fail, an exception is raised to the agent that made the access. To mitigate the costs of the checks, RME allows representing the association between an address and a world (Realm, Non-secure, or Secure) in caches on a per-cache-line basis. Additionally, the architecture allows TLBs (translation lookaside buffers) to cache the association of a page with a given world. TLBs can cache both the translation of a virtual address to a physical one as well as the GPT validation.

To complement these two forms of caching, RME extends Armv9-A with new instructions to manage these new aspects in both the processor caches and the TLBs. The ability to dynamically move memory resources among different security environments, introduced through the GPT, is the most important change that RME contributes to the Armv9-A architecture.

The GPT is checked by not only processors, but also the Arm System MMU (memory management unit),3 which is used to gate device accesses. This, combined by advances in PCIe (peripheral component interconnect express), enables trusted assignment of devices to realms. A realm can attest a device and decide whether it trusts it. If the device is trusted, it is allowed to directly access the memory of the realm. Other realms do not have to trust it.

This capability is provided by the security checks conducted by the Arm SMMU (system memory management unit). In addition to GPT checking, the SMMU supports checking for devices that support PCIe ATS (address translation services) and that send transactions using physical addresses. For such devices, the SMMU includes a Device Permission Table that ensures that such transactions are to addresses within the realms to which they are attached.

The isolation created through page-table and GPT checks are the main form of security provided for realms. For additional defense in depth, RME can be completed further using the Memory Encryption Contexts extension (FEAT_MEC). This extension allows the memory associated with a realm to be uniquely encrypted relative to other realms.

 

Standard Reference Firmware Architecture

Hypervisors and kernels manage resources—mainly processor cycles and memory—much in the same way they do for VMs and processes today. Supervisory software still needs to be able to create and destroy realms, add memory to or remove memory from realms, and schedule realm execution. Policy code aimed at deciding when to perform these operations for VMs and processes can be directly reused for realm management. The mechanics differ, however, because supervisors are prevented from accessing realm content. These operations require interaction with secure firmware components, which manage the GPT as well as realm translation tables and contexts.

RME defines a standards architecture that is implemented in hardware. Platforms also require software (including firmware) to configure RME and to service workloads that request attestation and verification. RME allows implementers to design their own software architecture, but Arm has also released an open reference software architecture called Arm CCA (Confidential Compute Architecture).4

Arm CCA standardizes the essential software and firmware interfaces needed to configure and use RME with the aim of making firmware, which forms part of the TCB, simple, small, and easy to audit and verify. Arm CCA is supported by open-source reference implementations of each of those components and is free to license and deploy.

The key components of Arm CCA are shown in figure 2. The RMM (Realm Management Monitor) is responsible for managing communication between a workload and the hypervisor through the monitor. RMM does not make resource decisions, such as which realm to run or what memory to allocate to a realm. Those decisions remain with the host hypervisor.

Elevating Security with Arm CCA

The second key component is the monitor, which is in charge of context switching between security states (Realm, Secure, and Non-Secure) and managing the GPT.

Together with the hardware, the RMM and the monitor form the TCB of a realm. The CCA design keeps complex policy such as allocation of resources, scheduling, or device assignment in the Non-secure host hypervisor. This reduces the codebase of the RMM and monitor, which are limited to security checks and mechanics. As an example, the Arm implementation of RMM does not even include a memory allocator. Further details on code can be found in the TF-RMM open-source project.10

By choosing to use Arm CCA and the open-source implementations such as TF-RMM, the wider Arm community shares the task of implementing confidential computing, while trust in the platform may be increased through third-party verification by the security research community.7

Standards, Certification, and Regulation

While the confidential computing industry's goal is to deliver platforms that cannot observe the workloads they host, the question remains about how platform claims are assessed by potential customers. Platform vendors can choose how much information to publish about the security design of their implementations on a spectrum from providing high-level marketing descriptions to pointing at reproducible builds and corresponding source code. It is reasonable to assume that some implementations will be more robust than others. It is not practical or scalable, however, for every potential customer to undertake their own detailed security and risk analysis. As the technology matures, it's likely that industry regulators will help with the introduction of standards and certifications.

At the simplest, this may mean that implementations can choose to display a logo or certificate that provides evidence that a platform has been certified to meet specific security requirements. Regulators governing industries such as financial services and healthcare may then advise or manage the regulated to use platforms that have been successfully certified.

Innovation in regulation tends to lag behind innovation in technology, in part because it is generally accepted by regulators that it must be possible to meet regulations soon after they are published. In time, though, standards and certifications may simplify the process of selecting a secure confidential computing platform.

 

Summary

Confidential computing has great potential to improve the security of general-purpose computing platforms by taking supervisory systems out of the TCB, thereby reducing the size of the TCB, the attack surface, and the attack vectors that security architects must consider. Confidential computing requires innovations in platform hardware and software, but these have the potential to enable greater trust in computing, especially on devices that are owned or controlled by third parties.

The use of attestation and verification is an integral part of adopting confidential computing, as it is through these mechanisms that a relying party can quickly and securely establish the authenticity of the platform they wish to use.

RME and Arm CCA provide a standard architecture for implementing confidential computing on systems that implement the Armv9-A architecture.2

Early consumers of confidential computing will need to make their own decisions about the platforms they choose to trust. As confidential computing becomes mainstream, however, it's possible that certifiers and regulators will share this burden, enabling customers to make informed choices without having to undertake their own evaluations.

 

References

1. Ali, O., Osmanaj, V. 2020. The role of government regulations in the adoption of cloud computing: a case study of local government. Computer Law & Security Review 36; https://www.sciencedirect.com/science/article/abs/pii/S0267364920300017?via%3Dihub.

2. Arm Developer. Arm Architecture Reference Manual for A-Profile Architecture; https://developer.arm.com/documentation/ddi0487/latest.

3. Arm Developer. Arm System Memory Management Unit Architecture Specification; https://developer.arm.com/documentation/ihi0070/latest/.

4. Arm Developer. Learn the architecture: introducing Arm Confidential Compute Architecture; https://developer.arm.com/documentation/den0125/0300/Arm-CCA-Software-Architecture.

5. Confidential Computing Consortium; https://confidentialcomputing.io.

6. Delignat-Lavaud, A., Fournet, C., Vaswani, K., Clebsch, S., Riechert, M., Costa, M., Russinovich, M. 2023. Why should I trust your code? acmqueue 21(4); https://queue.acm.org/detail.cfm?id=3623460.

7. Li, X., Li, X., Dall, C., Gu, R., Nieh, J., Sait, Y., Stockwell, G. 2022. Design and verification of the Arm Confidential Compute Architecture. Proceedings of the 16th Usenix Symposium on Operating Systems Design and Implementation, 465–484; https://www.usenix.org/system/files/osdi22-li.pdf.

8. Russinovich, M. 2023. Confidential computing: elevating cloud security and privacy. acmqueue 21(4); https://dl.acm.org/doi/pdf/10.1145/3623461.

9. Scott, H. S., Gulliver, J., Nadler, H. 2019. Cloud computing in the financial sector: a global perspective. Program on International Financial Systems; https://ssrn.com/abstract=3427220.

10. Trusted Firmware; https://www.trustedfirmware.org/projects/tf-rmm/ and https://www.trustedfirmware.org/.

11. Tschofenig, H., Sheffer, Y., Howard, P., Mihalcea, I., Deshpande, Y. 2023. Using attestation in Transport Layer Security (TLS) and Datagram Transport Layer Security (DTLS). IETF Datatracker; https://datatracker.ietf.org/doc/draft-fossati-tls-attestation/.

 

Charles Garcia-Tobin is an OS architect and Fellow in the Arm Architecture and Technology Group. He has worked at Arm for over 12 years, providing solutions in processor, system, and firmware architecture. His main aim is to create solutions that can be deployed easily in the ecosystem, and adopted by all major operating systems. He started this work by standardizing power management interfaces, then moved into server standardization in general, and these days focusses more on security. Today Charles leads the Arm confidential computing architecture project in Arm's architecture group.

Mark Knight is a director of product management in Arm's Architecture and Technology Group, responsible for security architectures, including the Arm Confidential Compute Architecture (Arm CCA) that was announced in March 2021 as part of Armv9-A. He has worked for more than 30 years developing products in technology and computing sectors in engineering and product management roles, with more than 20 years specializing in security and cryptography. Mark holds a BSc in computer science from the University of Hertfordshire.

Copyright © 2024 held by owner/author. Publication rights licensed to ACM.

acmqueue

Originally published in Queue vol. 22, no. 2
Comment on this article in the ACM Digital Library





More related articles:

Jinnan Guo, Peter Pietzuch, Andrew Paverd, Kapil Vaswani - Trustworthy AI using Confidential Federated Learning
The principles of security, privacy, accountability, transparency, and fairness are the cornerstones of modern AI regulations. Classic FL was designed with a strong emphasis on security and privacy, at the cost of transparency and accountability. CFL addresses this gap with a careful combination of FL with TEEs and commitments. In addition, CFL brings other desirable security properties, such as code-based access control, model confidentiality, and protection of models during inference. Recent advances in confidential computing such as confidential containers and confidential GPUs mean that existing FL frameworks can be extended seamlessly to support CFL with low overheads.


Raluca Ada Popa - Confidential Computing or Cryptographic Computing?
Secure computation via MPC/homomorphic encryption versus hardware enclaves presents tradeoffs involving deployment, security, and performance. Regarding performance, it matters a lot which workload you have in mind. For simple workloads such as simple summations, low-degree polynomials, or simple machine-learning tasks, both approaches can be ready to use in practice, but for rich computations such as complex SQL analytics or training large machine-learning models, only the hardware enclave approach is at this moment practical enough for many real-world deployment scenarios.


Matthew A. Johnson, Stavros Volos, Ken Gordon, Sean T. Allen, Christoph M. Wintersteiger, Sylvan Clebsch, John Starks, Manuel Costa - Confidential Container Groups
The experiments presented here demonstrate that Parma, the architecture that drives confidential containers on Azure container instances, adds less than one percent additional performance overhead beyond that added by the underlying TEE. Importantly, Parma ensures a security invariant over all reachable states of the container group rooted in the attestation report. This allows external third parties to communicate securely with containers, enabling a wide range of containerized workflows that require confidential access to secure data. Companies obtain the advantages of running their most confidential workflows in the cloud without having to compromise on their security requirements.


Gobikrishna Dhanuskodi, Sudeshna Guha, Vidhya Krishnan, Aruna Manjunatha, Michael O'Connor, Rob Nertney, Phil Rogers - Creating the First Confidential GPUs
Today's datacenter GPU has a long and storied 3D graphics heritage. In the 1990s, graphics chips for PCs and consoles had fixed pipelines for geometry, rasterization, and pixels using integer and fixed-point arithmetic. In 1999, NVIDIA invented the modern GPU, which put a set of programmable cores at the heart of the chip, enabling rich 3D scene generation with great efficiency.





© ACM, Inc. All Rights Reserved.