The Trusted Computing Base: Why Size Matters

Security discussions often focus on individual bugs:

a buffer overflow
a confused-deputy flaw
an unchecked parser
a privilege escalation path

Those bugs matter, but they are symptoms of a deeper question:

How much code must be correct for the system to remain secure?

That code is the trusted computing base, usually shortened to TCB.

The size and shape of the TCB determine how much code must be trusted, audited, tested, and reasoned about. A system can have strong abstractions on paper, but if those abstractions depend on a huge amount of privileged code behaving perfectly, the security argument becomes much weaker.

This post explains what the TCB is, why its size affects attack surface, and how EriX tries to keep trusted code small and explicit.

What Is the TCB?⌗

The trusted computing base is the set of components whose correctness is required for the security properties of the system to hold.

If a component in the TCB is compromised, the system’s security guarantees may no longer be true.

In an operating system, the TCB often includes:

the bootloader
the kernel
privileged system services
authentication and authorization logic
parsers for trusted boot data
cryptographic verification code
code that distributes or transfers authority

The exact TCB depends on the system design.

In a traditional monolithic kernel, much of the kernel is usually part of the TCB because many subsystems run with full kernel privilege. A bug in a driver, filesystem, or network stack may become a kernel compromise.

In a microkernel system, the kernel TCB can be smaller, but the total trusted system still includes the components that distribute authority and enforce policy.

That distinction matters.

Microkernels reduce the amount of code that runs with full machine authority. They do not make all user-space services untrusted by default.

Trusted Does Not Mean Safe⌗

The word “trusted” is easy to misunderstand.

A trusted component is not a component that is known to be correct.

It is a component that the system depends on being correct.

That is a less comfortable definition, but it is the useful one.

If a service is trusted to distribute capabilities, a bug in that service can grant authority incorrectly. If a parser is trusted to validate an executable before boot, a parser bug can undermine the boot chain. If a kernel syscall handler is trusted to validate endpoint rights, a missing check can break isolation.

Trust is not praise.

Trust is risk.

The goal is therefore not to label as much code as trusted as possible. The goal is to make the trusted set as small, narrow, and auditable as possible.

Why Size Matters⌗

The size of the TCB matters for several reasons.

1. More Code Means More Bugs⌗

All software has bugs.

As the amount of trusted code grows, the probability of security-relevant bugs also grows. This is especially true for code that:

parses untrusted input
manages memory
handles concurrency
interprets permissions
translates one authority model into another

Operating systems contain all of these patterns.

Reducing TCB size does not eliminate bugs, but it reduces the amount of code where a bug can compromise the whole system.

2. More Interfaces Mean More Attack Surface⌗

Attack surface is not only about lines of code.

It is also about entry points.

Each interface into trusted code is a place where an attacker may supply input:

syscall arguments
IPC messages
boot image metadata
ELF headers
filesystem structures
device descriptors
interrupt events
firmware-provided tables

Every interface needs validation.

A small trusted component with a poorly designed interface can still be dangerous. But as the number of trusted interfaces grows, the validation burden grows with it.

3. More State Means Harder Reasoning⌗

Security failures often occur not because one check is missing in isolation, but because state changes in an unexpected order.

For example:

a capability is copied before rights are reduced
a service starts before its startup bundle is fully validated
a stale local slot is treated as proof of authority
a device is considered present before discovery is complete
a process keeps authority after a failed launch path

The more trusted mutable state a system has, the harder it becomes to prove that every transition preserves the intended invariants.

This is why EriX emphasizes deterministic startup, explicit transfer records, and fail-closed behavior.

4. More Privilege Means Larger Blast Radius⌗

The same bug has different consequences depending on where it occurs.

A parsing bug in an unprivileged tool may crash that tool.

A parsing bug in the bootloader may compromise the entire system before the kernel starts.

A bug in a user-space driver with access only to a specific I/O range is serious, but it is different from a bug in a driver that runs inside the kernel with full machine authority.

Reducing the TCB is partly about reducing the blast radius of individual bugs.

The Kernel Is Only Part of the TCB⌗

It is tempting to say:

The TCB is the kernel.

That is usually too simple.

The kernel is central, but a secure system also depends on the code that prepares the kernel, starts the first user-space task, defines authority formats, and distributes capabilities.

In EriX, the kernel is explicitly in the TCB.

It owns kernel-space capability tables, scheduling state, and machine resources. It validates the bootloader-to-kernel handoff, creates the root task, manages core kernel objects, and exposes the architecture-specific trap, syscall, and interrupt entry points.

If the kernel fails to enforce capability checks, endpoint rights, address-space boundaries, or object lifetimes, the system’s isolation model fails.

But the kernel is not the whole story.

Boot Code Is Trusted Too⌗

The bootloader runs before the kernel.

That makes it security-critical.

In EriX, the bootloader is responsible for loading and verifying a signed boot.img, parsing kernel and service images, building a deterministic handoff structure, and transferring control to the kernel.

This places the bootloader in the TCB.

It holds firmware-provided authority during boot and controls the final jump into the kernel. It must treat the boot medium as untrusted, validate image structure and cryptographic integrity, reject malformed ELF binaries, and fail closed on ambiguity.

If the bootloader accepts a tampered image or builds an inconsistent handoff, the kernel may begin execution from a compromised foundation.

That is why boot code must be small, strict, and boring.

Parsers Can Be TCB Boundaries⌗

Parsers are often underestimated in system security.

They sit exactly where untrusted bytes become trusted structure.

EriX treats several parser and ABI crates as TCB or TCB-adjacent components:

lib-bootimg parses and verifies boot.img structure, hashes, and signatures
lib-elf validates ELF64 binaries before the bootloader trusts load segments
lib-handoff validates versioned handoff structures between boot stages
lib-ipc defines and validates IPC message layouts
lib-capabi defines capability types, rights, slot constants, and transfer descriptors

These libraries may not hold runtime capabilities themselves.

That does not make them irrelevant to the TCB.

If lib-bootimg accepts a modified boot image, the bootloader may trust code it should reject. If lib-elf accepts a malformed executable, the boot chain may load the wrong bytes or trust invalid segment ranges. If lib-ipc misdecodes a message, an operation may be interpreted incorrectly. If lib-capabi defines an overly broad role policy, a service may receive authority it should never hold.

Pure code can still be trusted code.

The important property is that these libraries are narrow. They do not perform I/O, do not own system policy, and avoid ambient authority. Their job is to parse, validate, and reject.

User-Space Services Can Be Trusted Components⌗

Moving policy out of the kernel does not make policy disappear.

It moves policy into user-space services where it can be isolated, constrained, and audited separately.

In EriX, rootd is the first policy-bearing user-space authority. It validates the kernel-to-root handoff, parses boot configuration, executes the startup DAG, and transfers least-privilege capabilities to required services.

rootd is high-privilege.

But it is not the kernel.

That distinction is important. rootd is trusted for early policy and capability distribution, but it does not implement kernel objects or own machine authority directly. Its job is to distribute authority according to explicit startup contracts.

Other services also sit inside specific trusted boundaries:

procd is trusted for process lifecycle orchestration
deviced is trusted for driver policy and driver-capability delivery
vfsd is trusted as the public filesystem namespace boundary
private filesystem providers are trusted only for their provider role

This does not make those services unimportant.

It makes their authority narrower than full kernel authority.

Attack Surface in a Monolithic Design⌗

In a monolithic kernel, many subsystems share one privileged address space.

This can make fast paths simple, but it also creates a broad attack surface.

A vulnerability in any in-kernel subsystem may become a kernel vulnerability:

malformed filesystem metadata
buggy network packet handling
unsafe device driver code
unexpected hardware descriptors
race conditions in shared kernel state

The attacker only needs one path into privileged code.

This does not mean monolithic kernels cannot be secure. They can be engineered, hardened, fuzzed, sandboxed, and audited extensively.

But the architecture starts with a large privileged surface.

The security argument must then explain how that large surface is controlled.

Attack Surface in a Microkernel Design⌗

A microkernel changes the shape of the attack surface.

The kernel still exposes critical interfaces:

syscalls
IPC delivery
scheduling
address-space operations
capability operations
interrupt handling

Those interfaces must be correct.

But higher-level services do not automatically run with full kernel privilege. If a filesystem provider handles malformed media, the goal is that the bug stays inside that provider’s authority. If a driver fails, the goal is that it fails with only the device authority it was explicitly given.

This turns one large privileged surface into several smaller authority surfaces.

That is not automatically simpler.

It only works if the boundaries are strict and the authority passed across them is narrow.

How EriX Minimizes the TCB⌗

EriX reduces TCB size and attack surface through several design choices.

1. A Policy-Minimal Kernel⌗

The EriX kernel is responsible for mechanisms, not system policy.

It handles:

core kernel objects
capability semantics
scheduling and task execution
address-space primitives
IPC and endpoint dispatch
interrupt and exception entry points

It does not own:

service startup policy
process orchestration policy
filesystem namespace policy
driver activation policy
high-level memory allocation policy

This keeps the kernel focused on enforcing isolation rather than deciding the shape of the whole system.

2. Explicit Capabilities Instead of Ambient Authority⌗

EriX models authority through capabilities.

A component can act only if it holds a capability with the required rights. The kernel validates capability references on use, enforces endpoint rights at syscall dispatch time, and treats transfers as explicit events.

This avoids relying on global names or conventional slot numbers as permission.

Knowing a slot number is not authority.

Holding the right capability in the local CSpace is authority.

That distinction is central to reducing the TCB: trusted code does not need to infer permissions from global state when the authority is carried explicitly.

3. Narrow Kernel-Control Endpoints⌗

Earlier operating system designs often concentrate control behind broad privileged interfaces.

EriX goes the other direction.

The current runtime uses narrow kernel-control endpoint families for specific jobs:

time
interrupts
hotplug
PCI configuration
console and framebuffer
COM1 I/O
i8042 I/O
memory retyping
VSpace mapping
pager fault resolution
process control
ACPI reads

Normal runtime boot does not hand services one broad root endpoint for all kernel operations.

Instead, a service receives the specific endpoint family it needs. timed receives time control. irqd receives interrupt control. drv-serial receives COM1-specific I/O. drv-i8042 receives i8042-specific I/O. drv-acpi receives ACPI-read authority.

This narrows both the trusted interface and the damage caused by misuse.

4. Exact Startup Contracts⌗

EriX treats startup capability transfer as a contract, not a suggestion.

Startup envelopes describe which capabilities a service should receive, where they should appear, and which rights they should carry.

Services validate the actual local slots they received before declaring readiness. Endpoint transfers are checked by source slot, destination slot, rights, and expected endpoint kind. Unknown, wrong, or extra endpoint transfers are rejected.

This prevents a common authority bug:

“A slot is occupied, so it must be the right thing.”

In EriX, slot occupancy alone is not proof of authority.

The capability must match the declared authority shape.

5. Staged Process Creation⌗

Process creation is a high-risk operation because it combines execution, address spaces, endpoints, and initial authority.

EriX routes runtime process creation through staged child creation rather than a legacy direct-create operation.

The staged flow is explicit:

create a staged child
receive an install grant and child endpoint alias
install the declared startup capability bundle
start the process only after population is complete

The kernel denies process start while live install grants still target that child stage.

This keeps partially populated processes from becoming runnable with an ambiguous authority state.

6. Role-Specific Driver Authority⌗

Drivers are a major source of operating system risk.

EriX does not treat all hardware control as one broad permission.

Driver authority is role-specific:

drv-serial receives COM1-only I/O authority
drv-i8042 receives i8042-only I/O authority
drv-acpi receives ACPI-read authority
probed receives PCI configuration read authority
drv-virtio-block receives a validated device frame rather than general memory authority

deviced manages driver policy, but it does not simply hand every driver a generic device-control capability. It uses explicit startup capability installation and role-specific authority surfaces.

This limits the TCB impact of driver bugs.

7. Device Memory Is a Separate Capability Type⌗

Device memory is dangerous because it can affect hardware state directly.

EriX distinguishes device memory from ordinary RAM with CAP_TYPE_DEVICE_FRAME.

The storage path can derive a BAR-backed MMIO frame for deviced, and deviced can install only the derived device frame into a driver’s startup bundle.

That device frame is not treated as ordinary allocatable memory.

This matters because confusing device memory with normal frames would widen the authority model and make reasoning about memory safety harder.

8. Clean-Room Dependencies⌗

Dependencies can enlarge the TCB quietly.

If trusted code depends on a general-purpose external library, the system may inherit:

code paths it does not need
assumptions that do not match the OS
parser behavior that is too permissive
update and supply-chain risk

EriX avoids third-party crates and implements its critical libraries within the project.

This increases implementation work, but it keeps the trusted boundary visible. When a parser, ABI crate, or crypto helper is part of the boot or authority path, it is part of the system’s review surface.

9. Fail-Closed Behavior⌗

A small TCB is not enough if failures are ambiguous.

EriX tries to make security-sensitive failures explicit and terminal:

malformed boot handoff stops boot
unsupported versions are rejected
invalid startup authority prevents readiness
wrong endpoint kinds fail validation
required-service startup failure halts progression
post-start bootstrap failures trigger fail-closed teardown

Failing closed is important because recovery heuristics often become hidden policy.

Hidden policy expands the trusted behavior of the system.

Smaller Does Not Mean Trivial⌗

Reducing the TCB does not make system design easy.

It often makes design more explicit.

Instead of placing all logic in one privileged address space, the system must define:

which component owns each decision
which authority each component receives
how authority is transferred
how failures are reported
how partial startup is cleaned up
how stale capabilities are invalidated or dropped

This is more work up front.

But it produces a system where the security argument is easier to inspect.

The question becomes:

Which component must be trusted for this specific property?

That is a better question than:

Is the whole kernel correct?

A Practical TCB View of EriX⌗

A practical view of the EriX TCB looks layered.

At the bottom is the boot chain:

bootloader
boot image verification
lib-bootimg read/verify path
ELF parsing
handoff validation

Then the kernel:

capability objects
CSpace and VSpace enforcement
IPC and endpoint rights
scheduling and trap handling
interrupt delivery

Then early trusted user space:

rootd for startup policy and capability distribution
procd for process lifecycle
deviced for driver policy
selected ABI and validation libraries shared across services

Then narrower trusted services:

filesystem namespace mediation in vfsd
private backend providers
input, console, logging, block, time, and interrupt services

Not all of these components are equally privileged.

That is the point.

EriX tries to avoid one flat trusted world. Instead, each component should be trusted only for its documented role and only with the capabilities it was explicitly given.

Looking Ahead⌗

The TCB discussion naturally leads to the implementation language.

If trusted code must be small, explicit, and auditable, then memory safety matters. So do unsafe boundaries, parser correctness, data layout, and the discipline required when working close to hardware.

The next post will examine why EriX is written primarily in Rust, what Rust does and does not solve for kernel development, and how it compares with the traditional C approach to systems programming.