The Trusted Computing Base: Why Size Matters
Security discussions often focus on individual bugs:
- a buffer overflow
- a confused-deputy flaw
- an unchecked parser
- a privilege escalation path
Those bugs matter, but they are symptoms of a deeper question:
How much code must be correct for the system to remain secure?
That code is the trusted computing base, usually shortened to TCB.
The size and shape of the TCB determine how much code must be trusted, audited, tested, and reasoned about. A system can have strong abstractions on paper, but if those abstractions depend on a huge amount of privileged code behaving perfectly, the security argument becomes much weaker.
This post explains what the TCB is, why its size affects attack surface, and how EriX tries to keep trusted code small and explicit.
What Is the TCB?⌗
The trusted computing base is the set of components whose correctness is required for the security properties of the system to hold.
If a component in the TCB is compromised, the system’s security guarantees may no longer be true.
In an operating system, the TCB often includes:
- the bootloader
- the kernel
- privileged system services
- authentication and authorization logic
- parsers for trusted boot data
- cryptographic verification code
- code that distributes or transfers authority
The exact TCB depends on the system design.
In a traditional monolithic kernel, much of the kernel is usually part of the TCB because many subsystems run with full kernel privilege. A bug in a driver, filesystem, or network stack may become a kernel compromise.
In a microkernel system, the kernel TCB can be smaller, but the total trusted system still includes the components that distribute authority and enforce policy.
That distinction matters.
Microkernels reduce the amount of code that runs with full machine authority. They do not make all user-space services untrusted by default.
Trusted Does Not Mean Safe⌗
The word “trusted” is easy to misunderstand.
A trusted component is not a component that is known to be correct.
It is a component that the system depends on being correct.
That is a less comfortable definition, but it is the useful one.
If a service is trusted to distribute capabilities, a bug in that service can grant authority incorrectly. If a parser is trusted to validate an executable before boot, a parser bug can undermine the boot chain. If a kernel syscall handler is trusted to validate endpoint rights, a missing check can break isolation.
Trust is not praise.
Trust is risk.
The goal is therefore not to label as much code as trusted as possible. The goal is to make the trusted set as small, narrow, and auditable as possible.
Why Size Matters⌗
The size of the TCB matters for several reasons.
1. More Code Means More Bugs⌗
All software has bugs.
As the amount of trusted code grows, the probability of security-relevant bugs also grows. This is especially true for code that:
- parses untrusted input
- manages memory
- handles concurrency
- interprets permissions
- translates one authority model into another
Operating systems contain all of these patterns.
Reducing TCB size does not eliminate bugs, but it reduces the amount of code where a bug can compromise the whole system.
2. More Interfaces Mean More Attack Surface⌗
Attack surface is not only about lines of code.
It is also about entry points.
Each interface into trusted code is a place where an attacker may supply input:
- syscall arguments
- IPC messages
- boot image metadata
- ELF headers
- filesystem structures
- device descriptors
- interrupt events
- firmware-provided tables
Every interface needs validation.
A small trusted component with a poorly designed interface can still be dangerous. But as the number of trusted interfaces grows, the validation burden grows with it.
3. More State Means Harder Reasoning⌗
Security failures often occur not because one check is missing in isolation, but because state changes in an unexpected order.
For example:
- a capability is copied before rights are reduced
- a service starts before its startup bundle is fully validated
- a stale local slot is treated as proof of authority
- a device is considered present before discovery is complete
- a process keeps authority after a failed launch path
The more trusted mutable state a system has, the harder it becomes to prove that every transition preserves the intended invariants.
This is why EriX emphasizes deterministic startup, explicit transfer records, and fail-closed behavior.
4. More Privilege Means Larger Blast Radius⌗
The same bug has different consequences depending on where it occurs.
A parsing bug in an unprivileged tool may crash that tool.
A parsing bug in the bootloader may compromise the entire system before the kernel starts.
A bug in a user-space driver with access only to a specific I/O range is serious, but it is different from a bug in a driver that runs inside the kernel with full machine authority.
Reducing the TCB is partly about reducing the blast radius of individual bugs.
The Kernel Is Only Part of the TCB⌗
It is tempting to say:
The TCB is the kernel.
That is usually too simple.
The kernel is central, but a secure system also depends on the code that prepares the kernel, starts the first user-space task, defines authority formats, and distributes capabilities.
In EriX, the kernel is explicitly in the TCB.
It owns kernel-space capability tables, scheduling state, and machine resources. It validates the bootloader-to-kernel handoff, creates the root task, manages core kernel objects, and exposes the architecture-specific trap, syscall, and interrupt entry points.
If the kernel fails to enforce capability checks, endpoint rights, address-space boundaries, or object lifetimes, the system’s isolation model fails.
But the kernel is not the whole story.
Boot Code Is Trusted Too⌗
The bootloader runs before the kernel.
That makes it security-critical.
In EriX, the bootloader is responsible for loading and verifying a signed
boot.img, parsing kernel and service images, building a deterministic handoff
structure, and transferring control to the kernel.
This places the bootloader in the TCB.
It holds firmware-provided authority during boot and controls the final jump into the kernel. It must treat the boot medium as untrusted, validate image structure and cryptographic integrity, reject malformed ELF binaries, and fail closed on ambiguity.
If the bootloader accepts a tampered image or builds an inconsistent handoff, the kernel may begin execution from a compromised foundation.
That is why boot code must be small, strict, and boring.
Parsers Can Be TCB Boundaries⌗
Parsers are often underestimated in system security.
They sit exactly where untrusted bytes become trusted structure.
EriX treats several parser and ABI crates as TCB or TCB-adjacent components:
lib-bootimgparses and verifiesboot.imgstructure, hashes, and signatureslib-elfvalidates ELF64 binaries before the bootloader trusts load segmentslib-handoffvalidates versioned handoff structures between boot stageslib-ipcdefines and validates IPC message layoutslib-capabidefines capability types, rights, slot constants, and transfer descriptors
These libraries may not hold runtime capabilities themselves.
That does not make them irrelevant to the TCB.
If lib-bootimg accepts a modified boot image, the bootloader may trust code it
should reject. If lib-elf accepts a malformed executable, the boot chain may
load the wrong bytes or trust invalid segment ranges. If lib-ipc misdecodes a
message, an operation may be interpreted incorrectly. If lib-capabi defines an
overly broad role policy, a service may receive authority it should never hold.
Pure code can still be trusted code.
The important property is that these libraries are narrow. They do not perform I/O, do not own system policy, and avoid ambient authority. Their job is to parse, validate, and reject.
User-Space Services Can Be Trusted Components⌗
Moving policy out of the kernel does not make policy disappear.
It moves policy into user-space services where it can be isolated, constrained, and audited separately.
In EriX, rootd is the first policy-bearing user-space authority. It validates
the kernel-to-root handoff, parses boot configuration, executes the startup DAG,
and transfers least-privilege capabilities to required services.
rootd is high-privilege.
But it is not the kernel.
That distinction is important. rootd is trusted for early policy and
capability distribution, but it does not implement kernel objects or own machine
authority directly. Its job is to distribute authority according to explicit
startup contracts.
Other services also sit inside specific trusted boundaries:
procdis trusted for process lifecycle orchestrationdevicedis trusted for driver policy and driver-capability deliveryvfsdis trusted as the public filesystem namespace boundary- private filesystem providers are trusted only for their provider role
This does not make those services unimportant.
It makes their authority narrower than full kernel authority.
Attack Surface in a Monolithic Design⌗
In a monolithic kernel, many subsystems share one privileged address space.
This can make fast paths simple, but it also creates a broad attack surface.
A vulnerability in any in-kernel subsystem may become a kernel vulnerability:
- malformed filesystem metadata
- buggy network packet handling
- unsafe device driver code
- unexpected hardware descriptors
- race conditions in shared kernel state
The attacker only needs one path into privileged code.
This does not mean monolithic kernels cannot be secure. They can be engineered, hardened, fuzzed, sandboxed, and audited extensively.
But the architecture starts with a large privileged surface.
The security argument must then explain how that large surface is controlled.
Attack Surface in a Microkernel Design⌗
A microkernel changes the shape of the attack surface.
The kernel still exposes critical interfaces:
- syscalls
- IPC delivery
- scheduling
- address-space operations
- capability operations
- interrupt handling
Those interfaces must be correct.
But higher-level services do not automatically run with full kernel privilege. If a filesystem provider handles malformed media, the goal is that the bug stays inside that provider’s authority. If a driver fails, the goal is that it fails with only the device authority it was explicitly given.
This turns one large privileged surface into several smaller authority surfaces.
That is not automatically simpler.
It only works if the boundaries are strict and the authority passed across them is narrow.
How EriX Minimizes the TCB⌗
EriX reduces TCB size and attack surface through several design choices.
1. A Policy-Minimal Kernel⌗
The EriX kernel is responsible for mechanisms, not system policy.
It handles:
- core kernel objects
- capability semantics
- scheduling and task execution
- address-space primitives
- IPC and endpoint dispatch
- interrupt and exception entry points
It does not own:
- service startup policy
- process orchestration policy
- filesystem namespace policy
- driver activation policy
- high-level memory allocation policy
This keeps the kernel focused on enforcing isolation rather than deciding the shape of the whole system.
2. Explicit Capabilities Instead of Ambient Authority⌗
EriX models authority through capabilities.
A component can act only if it holds a capability with the required rights. The kernel validates capability references on use, enforces endpoint rights at syscall dispatch time, and treats transfers as explicit events.
This avoids relying on global names or conventional slot numbers as permission.
Knowing a slot number is not authority.
Holding the right capability in the local CSpace is authority.
That distinction is central to reducing the TCB: trusted code does not need to infer permissions from global state when the authority is carried explicitly.
3. Narrow Kernel-Control Endpoints⌗
Earlier operating system designs often concentrate control behind broad privileged interfaces.
EriX goes the other direction.
The current runtime uses narrow kernel-control endpoint families for specific jobs:
- time
- interrupts
- hotplug
- PCI configuration
- console and framebuffer
- COM1 I/O
- i8042 I/O
- memory retyping
- VSpace mapping
- pager fault resolution
- process control
- ACPI reads
Normal runtime boot does not hand services one broad root endpoint for all kernel operations.
Instead, a service receives the specific endpoint family it needs. timed
receives time control. irqd receives interrupt control. drv-serial receives
COM1-specific I/O. drv-i8042 receives i8042-specific I/O. drv-acpi receives
ACPI-read authority.
This narrows both the trusted interface and the damage caused by misuse.
4. Exact Startup Contracts⌗
EriX treats startup capability transfer as a contract, not a suggestion.
Startup envelopes describe which capabilities a service should receive, where they should appear, and which rights they should carry.
Services validate the actual local slots they received before declaring readiness. Endpoint transfers are checked by source slot, destination slot, rights, and expected endpoint kind. Unknown, wrong, or extra endpoint transfers are rejected.
This prevents a common authority bug:
“A slot is occupied, so it must be the right thing.”
In EriX, slot occupancy alone is not proof of authority.
The capability must match the declared authority shape.
5. Staged Process Creation⌗
Process creation is a high-risk operation because it combines execution, address spaces, endpoints, and initial authority.
EriX routes runtime process creation through staged child creation rather than a legacy direct-create operation.
The staged flow is explicit:
- create a staged child
- receive an install grant and child endpoint alias
- install the declared startup capability bundle
- start the process only after population is complete
The kernel denies process start while live install grants still target that child stage.
This keeps partially populated processes from becoming runnable with an ambiguous authority state.
6. Role-Specific Driver Authority⌗
Drivers are a major source of operating system risk.
EriX does not treat all hardware control as one broad permission.
Driver authority is role-specific:
drv-serialreceives COM1-only I/O authoritydrv-i8042receives i8042-only I/O authoritydrv-acpireceives ACPI-read authorityprobedreceives PCI configuration read authoritydrv-virtio-blockreceives a validated device frame rather than general memory authority
deviced manages driver policy, but it does not simply hand every driver a
generic device-control capability. It uses explicit startup capability
installation and role-specific authority surfaces.
This limits the TCB impact of driver bugs.
7. Device Memory Is a Separate Capability Type⌗
Device memory is dangerous because it can affect hardware state directly.
EriX distinguishes device memory from ordinary RAM with CAP_TYPE_DEVICE_FRAME.
The storage path can derive a BAR-backed MMIO frame for deviced, and
deviced can install only the derived device frame into a driver’s startup
bundle.
That device frame is not treated as ordinary allocatable memory.
This matters because confusing device memory with normal frames would widen the authority model and make reasoning about memory safety harder.
8. Clean-Room Dependencies⌗
Dependencies can enlarge the TCB quietly.
If trusted code depends on a general-purpose external library, the system may inherit:
- code paths it does not need
- assumptions that do not match the OS
- parser behavior that is too permissive
- update and supply-chain risk
EriX avoids third-party crates and implements its critical libraries within the project.
This increases implementation work, but it keeps the trusted boundary visible. When a parser, ABI crate, or crypto helper is part of the boot or authority path, it is part of the system’s review surface.
9. Fail-Closed Behavior⌗
A small TCB is not enough if failures are ambiguous.
EriX tries to make security-sensitive failures explicit and terminal:
- malformed boot handoff stops boot
- unsupported versions are rejected
- invalid startup authority prevents readiness
- wrong endpoint kinds fail validation
- required-service startup failure halts progression
- post-start bootstrap failures trigger fail-closed teardown
Failing closed is important because recovery heuristics often become hidden policy.
Hidden policy expands the trusted behavior of the system.
Smaller Does Not Mean Trivial⌗
Reducing the TCB does not make system design easy.
It often makes design more explicit.
Instead of placing all logic in one privileged address space, the system must define:
- which component owns each decision
- which authority each component receives
- how authority is transferred
- how failures are reported
- how partial startup is cleaned up
- how stale capabilities are invalidated or dropped
This is more work up front.
But it produces a system where the security argument is easier to inspect.
The question becomes:
Which component must be trusted for this specific property?
That is a better question than:
Is the whole kernel correct?
A Practical TCB View of EriX⌗
A practical view of the EriX TCB looks layered.
At the bottom is the boot chain:
- bootloader
- boot image verification
lib-bootimgread/verify path- ELF parsing
- handoff validation
Then the kernel:
- capability objects
- CSpace and VSpace enforcement
- IPC and endpoint rights
- scheduling and trap handling
- interrupt delivery
Then early trusted user space:
rootdfor startup policy and capability distributionprocdfor process lifecycledevicedfor driver policy- selected ABI and validation libraries shared across services
Then narrower trusted services:
- filesystem namespace mediation in
vfsd - private backend providers
- input, console, logging, block, time, and interrupt services
Not all of these components are equally privileged.
That is the point.
EriX tries to avoid one flat trusted world. Instead, each component should be trusted only for its documented role and only with the capabilities it was explicitly given.
Looking Ahead⌗
The TCB discussion naturally leads to the implementation language.
If trusted code must be small, explicit, and auditable, then memory safety matters. So do unsafe boundaries, parser correctness, data layout, and the discipline required when working close to hardware.
The next post will examine why EriX is written primarily in Rust, what Rust does and does not solve for kernel development, and how it compares with the traditional C approach to systems programming.