Julian Gonzalez

Recap — I/O Software Meets Hardware

Last lecture we moved from the physical reality of I/O hardware to the software architecture that makes it usable:

I/O Software Principles — Device independence, uniform naming, ASAP error handling, synchronous appearance, buffering, and shareable vs. dedicated access define what good I/O software must accomplish
I/O Programming Techniques — Programmed I/O, interrupt-driven I/O, and DMA-driven I/O answer two core questions: who waits for readiness, and who moves the data?
Drivers — Device drivers were introduced as the kernel-level programs that translate between the OS's generalized interface and specific hardware controllers
Linux Practice — We saw how Linux makes these abstractions concrete through file_operations, major/minor numbers, wait queues, double buffering, and spooling

That left us with one enormous unresolved question: if the driver is the bridge between the clean software world and the messy hardware world, where does that bridge actually live? What privileges does it have? What rules apply when execution crosses into it?

Today we answer those questions by following one I/O request all the way down and back up again.

Today's Agenda

The Request Path — Framing a single read() call as a U-shaped journey through the system
The Trap — Crossing from user space into kernel space through a controlled syscall boundary
Dispatch — How the kernel routes a generic request to the right device driver
The Hardware Interface — What a driver does once it has control, and why user code cannot do it itself
The Fork — The moment process time and device time split apart
The Return — Interrupt arrival, deferred work, and rejoining the two timelines
The Code Is Lava — Why driver bugs are uniquely dangerous and how reentrancy keeps us honest
Looking Forward — From process-to-device communication to process-to-process communication

The Request Path: One `read()`, Two Worlds

To keep today's lecture concrete, we will follow one toy example all the way through the system: a user program calls read() on a character device at /dev/tempsensor. Imagine a simple sensor driver that returns the latest temperature reading as bytes.

The Process's Story

From the program's perspective, the story is almost insultingly simple.

In C, the way most Linux device interaction is written today:

int fd = open("/dev/tempsensor", O_RDONLY);
char buf[32];
ssize_t n = read(fd, buf, sizeof(buf));

In Rust, the way this course thinks about systems code:

use std::fs::File;
use std::io::Read;

let mut f = File::open("/dev/tempsensor")?;
let mut buf = [0u8; 32];
let n = f.read(&mut buf)?;

Both programs believe they asked a straightforward question: "Give me some bytes." If read() blocks, the program merely experiences waiting. When it returns, the program sees a result and continues. The entire process-side story is four steps:

Call read()
Wait if necessary
Receive bytes
Continue execution

That is the entire abstraction the OS is trying to preserve.

The System's Story

Underneath that neat sequential experience, the system does something far more involved:

The kernel receives the request through a syscall
The kernel dispatches to the correct driver
The driver programs the device
The process goes to sleep
The CPU runs other work
The device completes on its own schedule
An interrupt arrives
The kernel finishes the request and wakes the process

Those two stories are not merely different descriptions of the same event. They are genuinely different control flows that must be reconciled.

The U-Bend

Notice the shape. The process's steps 1 and 2 — calling read() and beginning to wait — sit at the left tip of the journey. The process's steps 3 and 4 — receiving bytes and continuing — sit at the right tip. Everything in between belongs to the system: kernel dispatch, driver code, hardware interaction, interrupt handling, and wakeup. That middle arc bends downward through layers of increasing privilege and hardware proximity, then curves back up to deliver the result.

The shape is a U — or a horseshoe, if you prefer. The two tips are the process's view of the world in user space. The interior bend is the system's view, entirely in kernel space.

The Request Path

Overview

Follow one read() call through the full U-shaped journey — from user space down into the kernel, through the driver to hardware, and back up again.

A horizontal line cuts across the top of the U, separating user space from kernel space. Our read() call descends through that line on the left, and the result ascends back through it on the right. Every section of today's lecture will advance us one step further along this path.

📌 Key Framing: A driver developer is always solving the same problem: how do I preserve a simple process-facing abstraction while cooperating with asynchronous hardware that does not care about my process's timeline?

💡 Key Insight: This U-bend shape is not unique to I/O. It appears whenever execution crosses an abstraction boundary, delegates work it cannot perform itself, and waits for a result to return. A syscall is a U-bend. A network request is a U-bend. Even the driver's own interaction with the hardware controller — issue a command, wait for an interrupt — is a smaller U-bend nested inside the larger one. The pattern is fractal: any section of the path that must delegate through a boundary spawns its own miniature U-bend within the larger arc. As the course continues, watch for this shape in IPC, networking, and beyond.

The Trap — Entering Kernel Space

Our read() call has begun. The process wants bytes from a device. But the very first thing it discovers is that it cannot satisfy its own request.

Recall: Two Privilege Worlds

We established this boundary earlier in the course: user-space code runs at a restricted privilege level (Ring 3 on x86), while kernel-space code runs with full machine access (Ring 0). The CPU enforces this distinction in hardware — it is not a suggestion.

The yellow syscall arrow in the diagram is the path our read() request must take. In the U-bend, this is the left descent: the moment execution crosses from user space into kernel space.

Why the Trap Exists

User-space code cannot write to memory-mapped device registers, acknowledge interrupts, allocate DMA buffers, or manipulate kernel memory. These are not arbitrary prohibitions — they are consequences of hardware protection. If any program could write to any hardware register, a single bug in a text editor could corrupt every device on the bus.

💀 Historical Disaster — MS-DOS and Flat Privilege: This is not hypothetical. MS-DOS ran on x86 processors in real mode — no rings, no user/kernel boundary, no memory protection at all. Every program had full access to every I/O port, every memory address, and every device register. A buggy game could corrupt the disk controller. A crashing word processor could wedge the keyboard. The only recovery was a hard reboot. The entire ring-based privilege architecture exists because the industry spent a decade learning that flat privilege is unworkable at scale. The U-bend's left descent — the trap — is the direct engineering response to that era.

When the process calls read(fd, buf, n), it does not jump directly to the driver. It executes a trap instruction that says: "switch to the kernel's entry path, save my state, and let the OS handle this." The kernel is now running, with full privilege, on behalf of the process that asked.

💡 Connection to LN8: This is the same ring-based privilege model from our CPU architecture lectures. What is new today is seeing it from the I/O side — the trap is not abstract anymore. It is the first step of every device interaction.

The trap also sets up the next problem. We are inside the kernel now, but the request is still generic. The kernel knows a process wants to read something. It does not yet know which device should answer.

Dispatch — Finding the Driver

We are inside the kernel. But being in kernel space does not immediately tell the machine which hardware should respond. The request is still generic:

read(fd, buf, n)

The kernel must answer a second question: what does this file descriptor actually refer to?

Generic Request, Specific Destination

In Unix-like systems, file descriptors are intentionally general. The process does not say "invoke the temp-sensor driver." It just passes a number and asks to read. The kernel consults its internal tables:

Lookup Step	What the Kernel Checks
Process table	Which process is making this request?
File descriptor table	Which open-file entry does `fd` refer to?
Inode / device node	What kind of object is this — file, pipe, device?
Driver dispatch	Which driver registered operations for this device?

Only after that chain of lookups can the request become device-specific.

📌 Callback to LN9: The process's open-file table, which we met during process management, is the same structure the kernel is consulting here. The file descriptor is a process-local index into a system-wide object — and for devices, that object ultimately points to a driver.

The Driver's Contract

This is where last lecture's file_operations discussion becomes operational. For a character device, the kernel reaches a function pointer table registered by the driver and calls its .read implementation.

In C, the actual Linux interface:

struct file_operations sensor_fops = {
    .owner   = THIS_MODULE,
    .open    = sensor_open,
    .release = sensor_release,
    .read    = sensor_read,
};

Conceptually, the same contract expressed as a Rust trait:

trait CharDevice {
    fn open(&mut self) -> Result<()>;
    fn release(&mut self) -> Result<()>;
    fn read(&mut self, buf: &mut UserSlice) -> Result<usize>;
}

Both say the same thing: "if you want to be a character device driver, you must implement these operations." The kernel calls them; the process never names them directly.

🤔 Notice the dual interface: The driver must satisfy the kernel's dispatch contract above — the function signatures, registration rules, and lifetime expectations — and the hardware protocol below — registers, timing, and interrupts. No user-space program lives under that kind of dual burden.

Why Not Just Do It Yourself?

A reasonable question: why can't the process skip all this and talk to the hardware directly?

The answer is the security model we just recalled. The user/kernel boundary exists precisely to prevent arbitrary user code from:

writing to device registers (which could corrupt the device or others on the bus)
masking or acknowledging interrupts (which could wedge the system)
accessing kernel memory (which could compromise every process on the machine)

The process cannot be its own driver. The privilege architecture physically prevents it. The dispatch system is not overhead — it is the enforcement mechanism that keeps every other process safe while one process uses a device.

This answers the question the U-bend poses visually: why does the path descend into the kernel at all? Because the work that needs to happen at the bottom of the bend is inherently privileged. There is no shortcut that stays at the user-space tips.

⚠️ Gotcha — iopl() and the Deliberate Exception: Linux actually does have a mechanism for user-space processes to access I/O ports directly: the iopl() and ioperm() syscalls. They require root, are architecture-specific, bypass all the protections described above, and are essentially the kernel saying "fine, but you lose every safety guarantee." The fact that this escape hatch exists, yet is deliberately obscure and terrifying to use, is itself evidence of how important the normal protection path is.

The Hardware Interface

The driver has been found. Its .read function has been called. Now the abstraction stops being uniform, and the driver must speak the device's own language.

What Drivers Actually Do

For our toy temperature sensor, a simplified .read path does something like this:

static ssize_t sensor_read(struct file *f, char __user *buf,
                           size_t n, loff_t *off) {
    struct sensor_dev *dev = f->private_data;

    u32 status = ioread32(dev->status_reg);
    if (!(status & DATA_READY)) {
        iowrite32(START_MEASUREMENT, dev->cmd_reg);
        wait_event_interruptible(dev->wq, dev->data_ready);
    }

    return copy_to_user(buf, &dev->last_reading, sizeof(u32));
}

The same logic, imagined in Rust:

fn read(&mut self, buf: &mut UserSlice) -> Result<usize> {
    let status = self.status_reg.read();

    if !status.contains(Status::DATA_READY) {
        self.cmd_reg.write(Command::START_MEASUREMENT);
        self.wait_queue.wait_interruptible(|| self.data_ready)?;
    }

    buf.write(&self.last_reading.to_ne_bytes())
}

Both versions do the same work:

Read a hardware status register to check if data is already available
If not, write a command to tell the device to begin measuring
Wait until the device signals completion
Copy the result to the user's buffer

Every one of these operations requires kernel privilege. ioread32 and iowrite32 touch memory-mapped device registers. wait_event_interruptible puts the calling process to sleep through the scheduler. copy_to_user crosses the kernel/user boundary to deliver the result. No user-space program has access to any of these.

💡 Connection to LN17: The controller/functional-hardware split matters here. The driver never tells the sensor crystal how to vibrate. It talks to the controller — the register-level interface the hardware exposes to the host.

The Translation Layer

That translation happens in both directions at once:

Direction	What the Driver Translates
Upward (toward kernel and process)	A clean `read()` result with a byte count and error status
Downward (toward hardware)	Register writes, status polling, DMA descriptor setup, interrupt configuration

This is the bottom of the U-bend — the furthest point from user space. From here, the path must curve back upward. But the return trip will not be as simple as the descent, because something is about to change.

💡 Fun Fact — FUSE and the Nested U-Bend: Not all drivers live in the kernel. FUSE (Filesystem in Userspace) lets a driver run as an ordinary user-space process — but it does so by adding a second U-bend inside the first. The kernel receives a filesystem request, realizes the handler is in user space, and sends the request back up through the boundary via FUSE. The user-space driver processes it, and the result descends back into the kernel, then ascends again to the original process. Two extra boundary crossings, slower performance, but vastly safer — the driver code runs without kernel privilege. This is the fractal U-bend made literal: the system trades performance for safety by nesting one horseshoe inside another.

The Fork — When Time Splits

Look at the driver code above. Line by line, it reaches wait_event_interruptible — and at that moment, two things that were traveling together split apart.

The Process Sleeps

The driver has told the scheduler: "this process cannot continue until the device has data." The scheduler marks the process as blocked and switches the CPU to something else.

Process State Diagram

Click a transition arrow to see what triggers it.

That running → blocked transition is not theoretical anymore. Our read() call just triggered it. The process will stay blocked until the device completes and the kernel moves it back to ready.

The Device Runs Alone

Meanwhile, the temperature sensor is measuring. It has no idea that the process is sleeping, or that the CPU is running entirely different code. The device operates on its own clock, at its own speed, finishing whenever it finishes.

This is the moment the U-bend forks. The process's path and the device's path are no longer the same execution. They will need to rejoin later — but for now, they are independent.

🤔 The Real Problem: Submission happened in one execution context. Completion will arrive in a different one. The process that started this request is asleep and may not even have its address space mapped when the device finishes. How does the kernel complete the job?

Process Context: What Makes It Special

Everything the driver did so far — reading registers, writing commands, putting the process to sleep — happened in process context. The rules of process context are not arbitrary conventions. They follow directly from the fact that the scheduler has a valid current process and full control over its lifecycle:

Capability	Why It Works in Process Context
Sleep / block	The scheduler owns a valid process to mark as blocked. It can switch to another process and return later.
Access user buffers	The requesting process's virtual address space is mapped. `copy_to_user` can safely write to the process's `buf` pointer.
Use the process's credentials	Permission checks work because the kernel knows who is asking.
Take slow paths	Since we can sleep, we can wait for locks, allocate memory that might page-fault, or retry operations.

📌 Key Point: Every rule of process context traces back to one fact: the scheduler has a known, owned process in the picture. When that fact stops being true, the rules change.

The Return — Interrupt, Defer, Rejoin

The device finishes. It does not politely wait for the process to wake up and check. It fires an interrupt — on its schedule, not ours.

Interrupt Context: A Different World

The interrupt could arrive while any process is running — or while the kernel is in the middle of something entirely unrelated. This is not process context with fewer features. It is a fundamentally different execution environment, and the differences are causal, not arbitrary:

Restriction	Causal Reason
Cannot sleep	There is no meaningful process to block. The interrupt handler does not "own" whatever process happened to be running when the interrupt arrived. If it called `sleep()`, it would block that innocent bystander — which has nothing to do with our sensor request.
Cannot access user buffers	The requesting process's address space is probably not mapped. The CPU was running a different process (or no process at all) when the interrupt fired. Attempting `copy_to_user` with the wrong address space would corrupt memory.
Must be fast	While the interrupt handler runs, interrupts at the same or lower priority level are blocked. The longer the handler takes, the longer other devices are ignored. A slow handler degrades the entire machine.
Cannot take sleeping locks	For the same reason as sleeping — any operation that might block would trap the wrong process's execution flow.

🤔 The Core Tension: The interrupt handler has the urgent information — "the device is done" — but lacks the process-specific context needed to finish the job. It knows the data is ready, but it cannot safely deliver it to the right user buffer. Something has to bridge that gap.

Deferred Work: Bridging the Gap

The classic textbook terminology is top half and bottom half:

The top half runs immediately in interrupt context: acknowledge the device, capture the completion status, record that data is ready
The bottom half runs later in a safer context: complete the data transfer, clean up state, wake the sleeping process

The "later" in bottom half is not vague. It has a specific target: the moment when the original process can be woken and its execution context restored. Deferred work is the mechanism by which the two forked paths of the U-bend rejoin into a single flow.

💡 Key Insight: Deferred work is not just "doing less in the interrupt handler." It is the kernel's strategy for moving execution from a context that lacks process identity back to one that has it — so the job can be completed with the right address space, the right credentials, and the right sleeping permissions.

Where Does the Bottom Half Run?

A natural question: if the bottom half does not run in interrupt context and the sleeping process has not woken up yet, then who runs it and where?

The bottom half is not a new process. Creating a process is far too expensive for something that happens on every device completion. It is also not code that gets passed to the sleeping process to run when it wakes — the sleeping process is still asleep when the bottom half executes.

Instead, the kernel provides lightweight mechanisms specifically designed for this kind of schedulable-but-not-a-process work:

Mechanism	Where It Runs	Can It Sleep?	Typical Use
Softirqs	Right after interrupt handlers return, in kernel context with interrupts re-enabled	No	High-frequency, performance-critical work (networking, block I/O)
Tasklets	Built on softirqs — a lighter-weight callback interface	No	Per-device interrupt completion
Workqueues	In the context of a kernel worker thread managed by the kernel itself	Yes	Work that may need to sleep (e.g., allocating memory, taking sleeping locks)

The important pattern is the same across all three: the interrupt handler schedules the bottom half (think of it as leaving a note that says "run this function when you can"), and the kernel runs it shortly after — in a context that is safer than interrupt context but is still kernel infrastructure, not the original process.

The bottom half does the bookkeeping: it records that data is ready, possibly moves data from device buffers into kernel buffers, and then wakes the sleeping process by marking it as ready in the scheduler. Only after the scheduler dispatches the process does the original code path resume. The process never "runs" the deferred work — it just benefits from the state the deferred work set up.

📌 Key Point: The bottom half is the bridge between the two forked timelines of the U-bend. It runs in kernel space on behalf of the system, not on behalf of any particular process. Only after it finishes its work and wakes the sleeping process do the two paths rejoin into a single flow again.

The Process Wakes

Once deferred work runs and the completion is recorded:

The kernel marks the process as ready (the blocked → ready transition in the state diagram)
The scheduler eventually dispatches the process again
The process resumes inside wait_event_interruptible, which now returns
The driver calls copy_to_user to deliver the bytes — safe again, because we are back in process context
The read() syscall returns to user space
The process receives its bytes and continues

The U-bend is complete. The process experienced one continuous call. The system experienced at least two different execution contexts, with other work interleaved in between, and a hardware event arriving on a completely separate timeline.

📌 Key Point: The hardest part of driver development is not programming hardware registers correctly. It is preserving one coherent process-facing story across multiple execution contexts and multiple moments in time.

The Request Path

Overview

Follow one read() call through the full U-shaped journey — from user space down into the kernel, through the driver to hardware, and back up again.

The Code Is Lava

We have now followed the full U-bend. Along the way, we accumulated every reason driver code is uniquely dangerous. The danger does not come from one source — it comes from the combination of everything the U-bend revealed.

Privilege

The driver runs in kernel space with full machine access. A bug does not just crash the driver — it can corrupt memory, wedge hardware, or panic the entire machine. There is no safety net. The process isolation that protects user-space programs from each other does not protect the kernel from itself.

Consider the same two lines of code in each environment:

char *p = NULL;
*p = 'x';

In user space, the MMU catches the null dereference, the kernel delivers SIGSEGV, and the process dies. Every other process is fine. The OS cleans up.

In kernel space, the same dereference triggers a kernel oops or panic. There is no higher authority to deliver a signal to. The machine halts, takes every running process with it, and any in-flight I/O — disk writes, network packets, device transactions — may be left in an inconsistent state.

💡 Connection to Rust: In Rust, null pointers do not exist. The equivalent situation uses Option<T> — accessing the inner value requires an explicit .unwrap() or pattern match, making the risk visible in source code rather than hidden behind a bare pointer. The bug above would require the programmer to write unwrap() — a deliberate opt-in to the danger rather than an invisible default.

Dual Interface

As we saw at the dispatch step, the driver must satisfy two contracts simultaneously:

The kernel contract above: function signatures, lifetime rules, locking discipline, return conventions
The hardware contract below: register protocols, timing requirements, interrupt acknowledgment, DMA descriptor formats

Violating either one breaks the driver, even if the other is perfectly correct.

Correct kernel contract, broken hardware contract:

static ssize_t sensor_read(struct file *f, char __user *buf,
                           size_t n, loff_t *off) {
    iowrite32(READ_DATA, dev->cmd_reg);       /* ✗ device expects START first */
    iowrite32(START_MEASUREMENT, dev->cmd_reg);
    wait_event_interruptible(dev->wq, dev->data_ready);
    return copy_to_user(buf, &dev->last_reading, sizeof(u32));
}

The function signature is correct, the return convention is correct, the locking is fine — but the device ignores READ_DATA because no measurement was started. The hardware contract was violated. The driver returns garbage.

Correct hardware protocol, broken kernel contract:

static ssize_t sensor_read(struct file *f, char __user *buf,
                           size_t n, loff_t *off) {
    iowrite32(START_MEASUREMENT, dev->cmd_reg);
    wait_event_interruptible(dev->wq, dev->data_ready);
    copy_to_user(buf, &dev->last_reading, sizeof(u32));
    return 0;  /* ✗ should return bytes read — kernel thinks nothing was transferred */
}

The hardware interaction is perfect. But the kernel contract says .read must return the number of bytes transferred. Returning 0 tells the kernel "end of file" — the calling process receives no data even though the device produced it.

Both contracts satisfied:

static ssize_t sensor_read(struct file *f, char __user *buf,
                           size_t n, loff_t *off) {
    struct sensor_dev *dev = f->private_data;
    iowrite32(START_MEASUREMENT, dev->cmd_reg);       /* ✓ hardware: start first */
    wait_event_interruptible(dev->wq, dev->data_ready);
    iowrite32(READ_DATA, dev->cmd_reg);               /* ✓ hardware: then read */
    if (copy_to_user(buf, &dev->last_reading, sizeof(u32)))
        return -EFAULT;                               /* ✓ kernel: error code on failure */
    return sizeof(u32);                               /* ✓ kernel: bytes transferred on success */
}

The hardware sees the correct register sequence. The kernel receives a proper byte count on success and a negative error code on failure. Both interfaces are honored simultaneously.

💡 Connection to Rust: Rust cannot help with either broken version above. The wrong register order is a semantic constraint that lives in the device's datasheet, not in the type system — the wrong-order version compiles and runs fine, it just produces garbage. Similarly, returning Ok(0) instead of Ok(4) is valid Rust. These are logic bugs, and no compiler can catch them. The dual interface is a design discipline, not a language feature.

Multiple Execution Contexts

The fork showed us that driver code does not run in one clean, predictable flow. The same driver's code paths may be entered from:

Entry Path	Context	Example
A user calls `read()`	Process context	Our sensor example
The device fires an interrupt	Interrupt context	Measurement complete
The kernel runs deferred work	Softirq / tasklet context	Bottom-half processing
A second user opens the device	Process context (different process)	Another program reads the sensor

Each of these has different rules about what is safe, and the driver must handle all of them correctly.

Here is a top half that does too much — it does the bottom half's job inline, blocking interrupts the entire time:

/* ✗ BROKEN: top half does all the work — no deferred work, blocks interrupts too long */
static irqreturn_t sensor_irq(int irq, void *data) {
    struct sensor_dev *dev = data;
    dev->last_reading = ioread32(dev->data_reg);
    dev->data_ready = true;
    wake_up_interruptible(&dev->wq);
    return IRQ_HANDLED;
}

This might appear to work, but it violates the top-half/bottom-half contract. While this handler runs, interrupts at the same or lower priority are blocked. Processing the data, updating shared state, and waking a process are all work that should be deferred. On a busy system with many devices, a top half that takes too long starves every other interrupt source.

The correct pattern captures the minimum urgent state and schedules the rest for a bottom half:

/* ✓ CORRECT: top half captures urgent state, defers the rest */
static irqreturn_t sensor_irq(int irq, void *data) {
    struct sensor_dev *dev = data;
    dev->raw_value = ioread32(dev->data_reg);
    tasklet_schedule(&dev->bh_tasklet);
    return IRQ_HANDLED;
}

The top half reads the device register — the one thing that truly cannot wait, since the device may overwrite its data register on the next measurement — and immediately schedules a tasklet. The tasklet (the bottom half) will process the data, set the ready flag, and wake the sleeping process in a context where interrupts are re-enabled and the system can breathe.

💡 Connection to Deferred Work: This is the top-half/bottom-half split in action. The tasklet_schedule call is the interrupt handler "leaving a note" — exactly the pattern we described in the deferred work section. The bottom half does the heavier work; the top half just captures what would be lost if it waited.

💡 Connection to Rust: Rust cannot prevent this mistake either. Doing too much work inside an interrupt handler is a design error — the compiler has no concept of "interrupt context" or "this function is a top half." Both the broken and correct versions would compile identically in Rust. The discipline of knowing what belongs in a top half versus a bottom half is an engineering judgment that no type system can enforce.

Reentrancy

Our toy driver might be touched simultaneously by:

one process submitting a new read()
another process trying to open() the same device
an interrupt handler reporting completion of a previous request
deferred work cleaning up state from an earlier operation

Shared state must be protected explicitly. Assumptions about "who is running now" must be spelled out in code. Reentrancy is especially valuable here because it guards against the most insidious class of driver bugs: the ones where everything works until two events happen to overlap.

Without protection, two concurrent reads corrupt each other:

/* ✗ BROKEN: no lock — two concurrent reads share dev->buf */
static ssize_t sensor_read(struct file *f, char __user *buf,
                           size_t n, loff_t *off) {
    struct sensor_dev *dev = f->private_data;
    dev->buf = dev->last_reading;  /* Process A writes here... */
    /* — Process B calls sensor_read on another CPU, overwrites dev->buf — */
    return copy_to_user(buf, &dev->buf, sizeof(u32));  /* ...A reads B's data */
}

The fix is explicit serialization:

/* ✓ CORRECT: mutex serializes access to shared state */
static ssize_t sensor_read(struct file *f, char __user *buf,
                           size_t n, loff_t *off) {
    struct sensor_dev *dev = f->private_data;
    ssize_t ret;
    mutex_lock(&dev->lock);
    dev->buf = dev->last_reading;
    ret = copy_to_user(buf, &dev->buf, sizeof(u32));
    mutex_unlock(&dev->lock);
    return ret;
}

💡 Connection to Rust: In the Rust equivalent, the buffer would be wrapped in Mutex<T>. The unprotected version would not compile — you cannot access the data without calling .lock(). The compiler enforces what C leaves to discipline.

⚠️ Gotcha: In user space, a race condition might corrupt one program's output. In a driver, a race condition can corrupt the device's state, the kernel's internal structures, and every process currently using that device — all at once.

The Cost Gradient

The same conceptual mistake — say, an unprotected shared variable — has radically different costs depending on where it occurs:

Location	Consequence of a Bug
User-space application	One process crashes; the OS cleans up
Kernel subsystem	Machine may panic or become unstable
Device driver	Machine, device, and pending process state may all become inconsistent simultaneously

To make this concrete, consider the same unprotected increment in all three locations:

count++;  /* no lock, no atomic — just a bare read-modify-write */

User-space application: Two threads race on a counter. One update is lost. The final count is off by one. The program logs a wrong number. Nobody else is affected.
Kernel subsystem: Two CPUs race on a scheduler queue length. The scheduler believes there are fewer runnable processes than there actually are. A process starves. The machine becomes sluggish or unresponsive, and the cause is nearly impossible to reproduce.
Device driver: The interrupt handler and a process-context path race on dev->pending_requests. The driver thinks a request completed that has not. It reuses a DMA buffer that the device is still writing to. The device overwrites kernel memory. The corruption may not manifest until minutes later, in a completely unrelated subsystem.

💡 Connection to Rust: Rust eliminates the data race — two threads cannot mutably access the same count without going through Mutex, AtomicUsize, or another synchronization primitive. The bare count++ above would not compile. But if you wrap it in an AtomicUsize and use the wrong memory ordering, or increment the wrong counter, the logic bug persists. Rust removes the mechanical failure; the semantic failure remains yours.

This cost gradient is why the industry cares so much about safer implementation techniques in kernel code — and why Rust's ownership model is attracting serious interest for driver development.

📚 Historical Note: This is not just theory. Microsoft published data showing that device drivers were responsible for roughly 85% of all Windows crashes (blue screens of death). Meanwhile, estimates for the Linux kernel put driver code at around 70% of the total codebase by volume. The most dangerous category of kernel code is also, by a wide margin, the largest. The cost gradient is not an edge case — it is the dominant reality of kernel engineering.

📌 Key Point: Driver code is not "harder C." It is code that runs with maximum privilege, across multiple execution contexts, while satisfying two independent contracts. The danger comes from the environment, not the language — though the language can help.

Looking Forward

This lecture followed one process asking one device for data. The kernel stood in the middle and mediated communication between two endpoints:

one endpoint was a process
the other endpoint was a device

Once you see that structure clearly, the next question arrives naturally: what happens when both endpoints are processes?

That is the doorway into IPC — inter-process communication. The kernel will still mediate, but the contracts, tradeoffs, and mechanisms change. In the next lecture we shift from process-to-device communication to process-to-process communication inside one machine.

💡 Preview: If processes are virtual hardware, and drivers mediate real hardware access, then IPC is the same mediation pattern — just between two virtual machines instead of a virtual machine and a physical device.

Summary

A single read() request follows a U-shaped path: the process's steps sit at the tips in user space, and the system's steps form the interior bend in kernel space
The trap is the controlled descent from user space into kernel space — a consequence of hardware privilege, not convention
Dispatch routes a generic request through file descriptor tables and driver registrations to the correct device-specific code
The driver operates at the bottom of the U-bend, translating between the kernel's uniform interface above and the device's register-level protocol below
When the device is not immediately ready, time forks: the process sleeps and the device runs independently, creating two parallel execution paths
Process context allows sleeping, user-buffer access, and scheduler interaction because there is a known, owned process in the picture
Interrupt context prohibits sleeping and user-buffer access because there is no owned process — the handler would trap the wrong execution flow
Deferred work bridges the gap between interrupt context and process context, allowing the two forked paths to rejoin safely
The combined pressure of privilege, dual interfaces, multiple execution contexts, and reentrancy makes driver code uniquely dangerous — the code is lava
This process-to-device mediation pattern naturally leads to the next lecture's question: how does the kernel mediate process-to-process communication?

📝 Lecture Notes

Key Definitions:

Term	Definition
System Call (Trap)	A controlled transfer of execution from user space into kernel space so privileged OS code can run on behalf of a process
Process Context	Kernel execution on behalf of a specific process; the scheduler owns a valid `current` process, so sleeping, user-buffer access, and blocking are all safe
Interrupt Context	Kernel execution entered because a hardware interrupt fired; no owned process exists, so sleeping and user-buffer access are unsafe
Deferred Work	A strategy for splitting interrupt handling into an urgent immediate portion (top half) and a later portion (bottom half) that runs where process identity is available
Reentrancy	The property that code behaves correctly even when entered again before a previous invocation completes

U-Bend Phase Summary:

Phase	Location	What Happens
Left tip	User space	Process calls `read()`, begins to wait
Left descent	Boundary	Syscall/trap transfers execution into kernel space
Dispatch	Kernel space	Kernel routes the generic request to the correct driver
Hardware interface	Kernel space	Driver programs the device using privileged operations
Fork	Kernel space	Process sleeps; device runs independently on its own timeline
Return	Kernel space	Interrupt arrives; deferred work completes the request; process wakes
Right tip	User space	Process receives bytes and continues execution

Context Comparison:

Property	Process Context	Interrupt Context
Trigger	A process called into the kernel	A hardware interrupt fired
Owns a process?	Yes — the scheduler has a valid `current`	No — the handler reacts to a device event
May sleep?	Yes — the scheduler can block and reschedule	No — sleeping would trap an uninvolved process
User-buffer access?	Yes — the requesting process's address space is mapped	No — the wrong address space may be active
Typical work	Submission, blocking, data copying, coordination	Acknowledge device, capture status, schedule deferred work

📚 Additional Resources

Linux Kernel References

Linux Kernel Documentation: Driver Basics — Conventions around driver lifetimes, sleeping rules, and execution context
Linux Kernel Documentation: Generic IRQ — The interrupt infrastructure that real Linux drivers plug into
Linux Device Drivers, 3rd Edition (LWN) — The classic free reference on Linux driver architecture and implementation

Historical Context

Multics Protection Rings — The classic historical root of ring-based privilege separation
Unix Device Files — The historical Unix approach to exposing device endpoints through filesystem names
Intel 8259 PIC — The classic interrupt controller that shaped generations of PC interrupt handling

Rust Systems Programming

The Rustonomicon — Why low-level unsafe code needs explicit discipline even in Rust
Rust for Linux — The project bringing Rust into the Linux kernel, directly relevant to upcoming lectures

Loading content...

Key Definitions:

Term	Definition
System Call (Trap)	A controlled transfer of execution from user space into kernel space so privileged OS code can run on behalf of a process
Process Context	Kernel execution on behalf of a specific process; the scheduler owns a valid `current` process, so sleeping, user-buffer access, and blocking are all safe
Interrupt Context	Kernel execution entered because a hardware interrupt fired; no owned process exists, so sleeping and user-buffer access are unsafe
Deferred Work	A strategy for splitting interrupt handling into an urgent immediate portion (top half) and a later portion (bottom half) that runs where process identity is available
Reentrancy	The property that code behaves correctly even when entered again before a previous invocation completes

U-Bend Phase Summary:

Phase	Location	What Happens
Left tip	User space	Process calls `read()`, begins to wait
Left descent	Boundary	Syscall/trap transfers execution into kernel space
Dispatch	Kernel space	Kernel routes the generic request to the correct driver
Hardware interface	Kernel space	Driver programs the device using privileged operations
Fork	Kernel space	Process sleeps; device runs independently on its own timeline
Return	Kernel space	Interrupt arrives; deferred work completes the request; process wakes
Right tip	User space	Process receives bytes and continues execution

Context Comparison:

Property	Process Context	Interrupt Context
Trigger	A process called into the kernel	A hardware interrupt fired
Owns a process?	Yes — the scheduler has a valid `current`	No — the handler reacts to a device event
May sleep?	Yes — the scheduler can block and reschedule	No — sleeping would trap an uninvolved process
User-buffer access?	Yes — the requesting process's address space is mapped	No — the wrong address space may be active
Typical work	Submission, blocking, data copying, coordination	Acknowledge device, capture status, schedule deferred work

Course Planner

Final Exam Release

HW 5: Hand-Tossed in Rust

Final Exam Due

LN 19: Read Between the Lines

Lecture Date

Standard

Topics Covered

📹 Lecture Recordings

Recap — I/O Software Meets Hardware

Today's Agenda

The Request Path: One read(), Two Worlds

The Process's Story

The System's Story

The U-Bend

The Request Path

The Trap — Entering Kernel Space

Recall: Two Privilege Worlds

Why the Trap Exists

Dispatch — Finding the Driver

Generic Request, Specific Destination

The Driver's Contract

Why Not Just Do It Yourself?

The Hardware Interface

What Drivers Actually Do

The Translation Layer

The Fork — When Time Splits

The Process Sleeps

Process State Diagram

The Device Runs Alone

Process Context: What Makes It Special

The Return — Interrupt, Defer, Rejoin

Interrupt Context: A Different World

Deferred Work: Bridging the Gap

Where Does the Bottom Half Run?

The Process Wakes

The Request Path

The Code Is Lava

Privilege

Dual Interface

Multiple Execution Contexts

Reentrancy

The Cost Gradient

Looking Forward

Summary

📝 Lecture Notes

📚 Additional Resources

Recommended Reading

Linux Kernel References

Historical Context

Rust Systems Programming

All Lecture Notes

Recap — I/O Software Meets Hardware

Today's Agenda

The Request Path: One read(), Two Worlds

The Process's Story

The System's Story

The U-Bend

The Request Path

The Trap — Entering Kernel Space

Recall: Two Privilege Worlds

Why the Trap Exists

Dispatch — Finding the Driver

Generic Request, Specific Destination

The Driver's Contract

Why Not Just Do It Yourself?

The Hardware Interface

What Drivers Actually Do

The Translation Layer

The Fork — When Time Splits

The Process Sleeps

Process State Diagram

The Device Runs Alone

Process Context: What Makes It Special

The Return — Interrupt, Defer, Rejoin

Interrupt Context: A Different World

Deferred Work: Bridging the Gap

Where Does the Bottom Half Run?

The Process Wakes

The Request Path

The Request Path: One `read()`, Two Worlds

The Request Path: One `read()`, Two Worlds