Microkernels vs Monolithic Kernels: Trade-offs Revisited
Few operating system design debates have lasted as long as the debate between microkernels and monolithic kernels.
At a surface level, the distinction seems simple:
- monolithic kernels keep most operating system services inside the kernel
- microkernels move most services into user space
In practice, the trade-off is more subtle.
The real question is not whether one structure is universally faster, cleaner, or more secure. The real question is where authority, complexity, failure, and performance costs should live.
This post revisits that trade-off, explains why many old microkernel arguments became oversimplified, and shows why modern systems like EriX make the microkernel model practical again.
The Historical Shape of the Debate⌗
Early operating systems were built under severe hardware constraints.
Memory was limited. CPUs were slower. Context switches were expensive. Caches, TLBs, multiprocessor systems, and fast syscall paths were far less capable than they are today.
Under those constraints, monolithic kernels were a natural fit.
Unix-like systems placed filesystems, device drivers, networking, process management, and many other services inside one privileged kernel address space. That design made many operations cheap:
- a filesystem could call the block layer directly
- a network stack could access driver structures directly
- kernel subsystems could share data without IPC
The result was efficient and pragmatic.
It also meant that large amounts of code ran with full kernel privilege.
Why Microkernels Appeared⌗
Microkernels came from a different observation:
Most operating system code does not need full machine authority.
A filesystem does not need to modify arbitrary page tables. A keyboard driver does not need to access every process. A network stack does not need to control the scheduler.
Microkernels keep only the smallest necessary mechanisms in the kernel, usually including:
- scheduling
- address space management
- inter-process communication
- capability or handle management
- interrupt and exception delivery
Higher-level services run as ordinary user-space processes.
This gives the system stronger isolation. A driver crash does not have to be a kernel crash. A filesystem bug does not automatically become arbitrary kernel memory corruption. Authority can be distributed more precisely.
The idea was compelling, but early implementations often struggled with performance and compatibility.
The First Performance Problem⌗
The classic criticism of microkernels is that they are slow.
That criticism did not appear out of nowhere.
Some early microkernel systems placed traditional operating system services behind many separate user-space servers, then tried to preserve familiar Unix interfaces on top. A simple operation could become a chain of messages:
- application to file server
- file server to memory manager
- memory manager to pager
- pager to block service
- block service to driver
Each step could involve a context switch, message validation, scheduling decision, and sometimes copying.
If interfaces are too chatty, the cost adds up.
The mistake was turning this into a universal rule:
Microkernels are slow.
A more accurate rule is:
Poorly designed IPC paths and overly chatty service boundaries are slow.
That distinction matters.
The Monolithic Fast Path⌗
Monolithic kernels can be extremely fast because they avoid many protection boundaries.
An in-kernel filesystem can call an in-kernel block layer with a normal function call. A driver can share memory directly with another subsystem. There is no need to serialize every request into a message format.
This is a real advantage.
But it is not free.
The monolithic fast path often comes with:
- larger privileged code size
- more shared mutable state
- more kernel-internal locking complexity
- more ways for one subsystem to corrupt another
- a larger trusted computing base
Performance is not only about instruction count. It is also about cache behavior, lock contention, fault containment, recovery, and the cost of maintaining correctness over time.
A monolithic kernel may win a raw microbenchmark while still making isolation and auditability harder.
Performance Myth: Every Boundary Is Fatal⌗
A common myth is that each microkernel boundary is so expensive that the design cannot compete.
That view is outdated.
A boundary has a cost, but modern systems can make that cost manageable:
- fast syscall and return paths
- better scheduling heuristics
- shared-memory data paths
- page mapping instead of bulk copying
- batched requests
- asynchronous event delivery
- carefully designed IPC ABIs
The important design goal is to keep policy out of the kernel without forcing every byte of data through the kernel.
The kernel should mediate authority. It should not necessarily move all data.
Performance Myth: IPC Means Copying Everything⌗
IPC is often imagined as “copy this entire buffer from process A to process B”.
That is only one possible design.
A microkernel can pass small control messages while transferring authority to shared memory, frames, endpoints, or device objects. The expensive data path can remain mapped, while the kernel only validates who is allowed to access it.
This is central to capability-based design.
Instead of copying large data structures through a privileged subsystem, a process can receive a capability that authorizes access to a specific object with specific rights.
The kernel remains responsible for enforcing the transfer. It does not need to understand every high-level protocol built on top of that transfer.
Performance Myth: User-Space Drivers Are Not Practical⌗
User-space drivers are often treated as a research idea.
The concern is understandable. Hardware access is sensitive, interrupts are timing-sensitive, and drivers often sit on hot paths.
But most drivers do not need full kernel authority.
A driver usually needs access to:
- a specific I/O port range
- a specific MMIO region
- a specific interrupt line
- a specific DMA or buffer arrangement
Those are narrower forms of authority than “the whole kernel”.
If the kernel can delegate exactly those resources, a driver can run outside the kernel while still performing useful work. If it fails, the system has a chance to stop, restart, or replace that driver without treating the failure as kernel memory corruption.
The trade-off is real: user-space drivers need good IPC, careful interrupt delivery, and explicit resource ownership. But the model is not inherently impractical.
What EriX Puts in the Kernel⌗
EriX is designed as a capability microkernel.
The EriX kernel is intentionally policy-minimal. Its architecture documents define the kernel as responsible for:
- validating the bootloader-to-kernel handoff
- managing core kernel objects and capability semantics
- creating the root task
- exposing trap, syscall, and interrupt entry points
The kernel is explicitly not responsible for:
- system policy
- process orchestration policy
- high-level memory policy
- service lifecycle policy
This is the microkernel line in practice.
The kernel starts with machine authority, but it must convert that authority into explicit kernel objects and capability references. No ambient authority is supposed to leak into user space.
What EriX Moves Out of the Kernel⌗
EriX places policy-bearing functionality into user-space services.
For example:
rootdis the first policy-bearing user-space authorityprocdowns process lifecycle managementdevicedowns driver policy and driver startup orchestrationvfsdowns the public filesystem namespace- filesystem providers such as
ramfsd,e2fsd, andfatdremain private backend peers behindvfsd
This is not just “move code out of the kernel” as an aesthetic choice.
Each service boundary defines an authority boundary.
rootd distributes least-privilege startup capabilities. procd creates and
starts processes through staged child creation and install grants. deviced
does not directly become the kernel; it asks procd to manage driver
processes and passes only the driver authority required for each role.
That structure is more verbose than a monolithic kernel call graph, but it makes authority flow visible.
Narrow Authority Instead of Broad Privilege⌗
One of the most important EriX implementation details is the move away from a broad root endpoint as the normal runtime control surface.
The current kernel exposes narrow kernel-control endpoint families for specific jobs:
- time control
- interrupt control
- hotplug events
- PCI configuration reads
- console and framebuffer access
- COM1 I/O
- i8042 I/O
- memory retyping
- VSpace mapping
- pager fault resolution
- process control
- ACPI reads
Runtime dispatch is keyed by the endpoint object and its kind, not by a privileged global slot number.
This matters because a task does not gain authority merely by knowing a conventional slot value. It must actually hold the right capability in its own local capability space.
For example, drv-serial receives COM1-specific I/O authority. drv-i8042
receives i8042-specific I/O authority. drv-acpi receives ACPI-read authority.
probed receives PCI configuration read authority.
That is a different security shape from placing all of those operations behind one broad kernel handle.
Device Memory as an Explicit Object⌗
EriX also treats device memory authority as explicit and typed.
The kernel has a distinct CAP_TYPE_DEVICE_FRAME for validated device memory.
In the storage path, a BAR-backed MMIO frame can be derived for deviced, and
deviced can then install only the derived device frame into the staged driver
startup bundle.
The point is not that device drivers become simple.
The point is that MMIO authority is not confused with ordinary RAM frames and is not exposed through a generic “do anything with device memory” escape hatch.
This is exactly the kind of detail that makes modern microkernels viable: hardware access is delegated as a precise object with precise rights.
IPC as an ABI, Not an Accident⌗
In a monolithic kernel, many internal interfaces are ordinary function calls.
In a microkernel, IPC becomes part of the system ABI. That makes it more important, not less.
EriX treats IPC as a shared contract:
- message headers are versioned
- layouts are fixed
- parsing uses checked arithmetic
- malformed payloads fail closed
- capability transfers are explicit
- transfer-bearing runtime messages require
GRANT
This is the opposite of treating IPC as an afterthought.
The cost of IPC is controlled partly by implementation, but also by interface design. A carefully designed ABI avoids unnecessary round trips, keeps messages bounded, and separates control transfer from data movement.
Why Microkernels Are Viable Again⌗
Microkernels are more viable today for several reasons.
1. Hardware Changed⌗
The relative cost of a protection boundary has changed.
Context switches and syscalls are still not free, but modern CPUs, memory systems, and interrupt mechanisms make the raw cost less decisive than it was when early microkernel experiments were judged.
At the same time, modern systems are more complex and more exposed. The cost of kernel compromise has gone up.
Isolation is more valuable now.
2. We Understand IPC Better⌗
The lesson from earlier systems is not “avoid IPC”.
The lesson is:
- avoid needless IPC
- avoid chatty protocols
- avoid copying large data when authority transfer is enough
- design service boundaries around real ownership
Microkernels are viable when IPC is treated as a first-class design problem.
3. Capabilities Make Boundaries Useful⌗
Moving code into user space is only half the story.
If every user-space server still receives broad implicit privilege, the system has mostly recreated a monolith with extra context switches.
Capabilities make the boundary meaningful.
In EriX, authority is represented by typed capabilities with explicit rights. Services validate the capabilities they receive. Startup bundles describe declared authority. Kernel and service code avoid treating canonical slot numbers as ambient permission.
That makes decomposition more than modularity. It makes decomposition part of the security model.
4. Language and Tooling Improved⌗
Modern implementation languages and tooling also change the trade-off.
Rust does not eliminate operating system bugs, but it does make many memory safety mistakes harder to write accidentally. It also makes unsafe boundaries visible during review.
For a microkernel system, this is especially useful. The kernel can remain small and auditable, while user-space services can still be written with stronger safety guarantees than traditional C-heavy system components.
EriX combines this with a clean-room approach and no third-party crates, which keeps the system easier to audit even though it increases implementation work.
The Remaining Costs⌗
Microkernels still have real costs.
They require:
- more explicit startup logic
- carefully versioned IPC contracts
- robust service supervision
- more thought about batching and data movement
- clear ownership of every capability
- good tracing and performance measurement
They also move some complexity out of the kernel rather than deleting it.
rootd, procd, deviced, and filesystem services still need careful design.
They may be outside the kernel, but they can still be trusted components for
specific parts of the system.
The difference is that their authority can be narrower than kernel authority, and their failures can be contained more deliberately.
The Trade-Off Revisited⌗
The old framing was often:
- monolithic kernels are fast
- microkernels are clean but slow
That framing is too simple.
A better framing is:
- monolithic kernels optimize for direct in-kernel cooperation
- microkernels optimize for explicit authority and fault isolation
- either design can be fast or slow depending on implementation
- either design can become complex if boundaries are poorly chosen
For EriX, the microkernel choice follows from the system goals:
- minimal trusted computing base
- explicit authority through capabilities
- strict separation between kernel and user space
- auditable service boundaries
- deterministic bootstrap and failure behavior
Those goals do not make performance irrelevant.
They define where performance work should happen: fast IPC, careful service interfaces, shared-memory data paths, narrow endpoint families, and explicit capability transfer.
Looking Ahead⌗
Microkernels are not a shortcut.
They demand more up-front design discipline than a simple in-kernel call graph. They force the system to define authority, ownership, and failure behavior early.
That is exactly why they are interesting.
EriX uses the microkernel model not because it is fashionable, but because it matches the architecture: a small kernel, capability-mediated authority, and policy implemented by explicit user-space services.
The next post will examine the idea that motivates much of this structure: the trusted computing base.
We will look at what the TCB actually includes, why its size affects attack surface, and how EriX tries to keep trusted code small by moving policy into explicit, capability-constrained user-space services.