In this lecture we cover the software fundamentals of developing I/O systems. We cover the critical software principles we follow, the approaches used to make those ideals real, and then the implementation of those principals on modern systems. We cover device independence, buffering, spooling, and more.
Last lecture we opened the I/O chapter and surveyed the hardware landscape that the operating system must manage:
Everything is I/O β Nearly every component beyond the CPU's ALU requires a managed communication channel: storage, GPUs, network adapters, USB controllers, sensors, displays
I/O Standards β Interfaces evolved from parallel buses (ISA, PCI) to serial connections (USB, SATA, PCIe), increasing bandwidth while simplifying hardware
Communication Patterns β Devices classify by how they transfer data: block (fixed chunks, random access), character (continuous stream, sequential), or network (discrete packets with metadata)
Device Anatomy β Every I/O device splits into a controller (host-facing protocol adapter) and functional hardware (device-specific internals) β and the two are independent
Polling vs. Interrupts β The CPU detects I/O completion by either busy-waiting on a status register (polling) or receiving an asynchronous hardware signal (interrupts), each with distinct tradeoffs
Bus Architecture β Chipsets evolved from centralized (Northbridge/Southbridge) to integrated (PCH + on-die memory controller in the SoC), improving performance but complicating data flow
PMIO vs. MMIO β Software addresses devices through a separate port space with special instructions (PMIO) or through the normal address space via page tables with uncacheable markings (MMIO)
DMA β Dedicated transfer hardware moves data between devices and memory without per-word CPU involvement; modern devices use bus-mastering, descriptor rings, scatter/gather, and MSI/MSI-X
I/O Coherence β When DMA and CPU caches disagree, coherence is maintained through cache-coherent interconnects, explicit flushes, and the IOMMU
That is a lot of moving parts. But there is a unifying thread running through every decision we studied: at each hop along the data path β from the physical device, through its controller, to the mainboard β engineers face the same fundamental question: is the data best treated as discrete units or as a continuous flow?
The answer at one hop does not dictate the answer at the next. The controller can adapt between them. The system can listen differently depending on workload. This single recurring question β discrete or continuous β is the skeleton that holds all of LN17's topics together. The visual below distills it into a reasoning chain you can apply to any I/O device.
I/O Reasoning Chain
At each stage along the data path, the same question recurs: discrete or continuous? The dashed crossβlines show where the controller or system adapts between paradigms.
Walking Through the Chain
At Stage 1, we ask what the physical hardware naturally produces. An HDD platter stores data in independent sectors β discrete, self-contained units that align with the block paradigm. A temperature sensor, by contrast, reports a continuously changing value β a flowing state that aligns with the character paradigm. The hardware's physical nature suggests a starting point, but it does not lock in every downstream decision.
At Stage 2, the device controller decides how to package that data for the wire. If the consumer benefits from random access and interleaving, a block-oriented protocol makes sense. If the consumer needs low-overhead, sequential delivery, a stream protocol wins. When the hardware's nature mismatches the chosen protocol, the controller adapts. A microphone's continuous analog waveform gets sampled into discrete digital packets by the ADC. A keyboard's discrete keypress events get serialized into a sequential byte stream over USB.
At Stage 3, the system asks how to detect when data is ready. If data arrives in unpredictable bursts with idle gaps between them, interrupts free the CPU to do other work. If data must be consumed with the absolute lowest latency or arrives near-continuously, polling eliminates the interrupt-handling overhead. And again, the system can adapt β high-frequency interrupts approximate polling behavior, and periodic polling approximates interrupt timing.
The Mouse: Both Paths at Once
A mouse is the perfect device to test this reasoning chain, because it breaks any simplistic "one device = one paradigm" assumption. A mouse click is a discrete event at the hardware level β it naturally fits a discrete protocol and interrupt-driven notification. But mouse movement is a continuous position state that changes smoothly over time. The USB HID controller must adapt: it samples the continuous position into periodic discrete reports (typically at 125 Hz to 8 kHz). The host then polls for those reports at a fixed interval. One device, both paradigms, handled simultaneously by the controller.
This chain is not just a summary tool β it is a design tool. When you encounter a new I/O device, walk it through these three stages. The answers will tell you what communication pattern to expect, what protocol tradeoffs to consider, and how the system should listen. With this intuition from the hardware in hand, we are ready for the next question: how do we program for it?
Today's Agenda
The I/O Decision Chain β Unifying LN17's hardware topics into a three-stage reasoning tool: physical nature, controller protocol, notification strategy
Major I/O Software Principles β Six design constraints that shape every I/O subsystem: device independence, uniform naming, ASAP error handling, synchronous appearance, buffering, shareable vs. dedicated
I/O Programming Techniques β Three programming models defined by how much hardware does for us: programmed I/O, interrupt-driven I/O, DMA-driven I/O
The Hybrid Reality β Why modern systems use all three techniques simultaneously, and how to reason about which to apply
Device Drivers β The kernel-level programs that bridge generalized OS interfaces with specific device hardware
Principles in Practice β How Linux drivers implement each of the six I/O principles, from file_operations to double buffering to user-space spooling
Major I/O Software Principles
We now understand the hardware landscape β controllers, buses, DMA, interrupts, coherence. But hardware only provides the raw mechanism. The operating system must build a software architecture on top of it that is maintainable, portable, and correct. Over decades of OS development, six principles have emerged as the foundation of that architecture. Each one addresses a different challenge that arises when software meets I/O hardware.
These principles are not implementation details β they are design constraints that shape every I/O subsystem from Linux to Windows to embedded RTOS kernels. We will introduce each one here at a conceptual level, then revisit them in code throughout the rest of this unit.
Device Independence
Device independence is achieved through software layering. The OS provides a uniform syscall API (open, read, write, close, ioctl) that all applications use. Below that API, device-specific drivers handle the translation between the generic interface and the actual hardware registers, protocols, and timing requirements.
The visual below shows this layering on top of the PCH hardware diagram from LN17. Everything above the purple boundary uses the same API regardless of which device is being accessed. Everything below is device-specific β different drivers for different hardware.
Device Independence
Device independence is a software layering principle. The purple boundary separates the uniform syscall API from device-specific drivers and hardware. Toggle visibility to see what applications cannot see.
Toggle the hardware visibility. When the hardware is hidden, you see the world from the application's perspective: a uniform interface with no visible device differences. When the hardware is shown, the diverse reality underneath is revealed β five completely different devices with different controllers, protocols, and functional hardware, all hidden behind the same read() call.
π‘ Key Insight: The controller/functional-hardware split we studied in LN17 is what makes this possible. Because every device presents a standard controller interface, the OS can write a driver for each controller type and expose a uniform API above it. The application never touches the functional hardware.
Uniform Naming
On a Linux system, your SATA SSD is /dev/sda. Your NVMe drive is /dev/nvme0n1. A USB flash drive plugged into any port appears as /dev/sdb. A program that operates on block devices does not need to know which type of storage it is addressing β the path is the only identifier it uses.
Uniform Naming
Every device maps to a filesystem path β programs use names like /dev/sda without knowing the hardware. Network devices are the exception: they use the socket API instead.
Notice how every device wrapper collapses into a simple path label. The complex internals β controllers, functional hardware, protocol differences β disappear behind a name. A program that copies data from /dev/sda to /dev/sdb does not care that one is a SATA SSD and the other is a USB flash drive.
π€ The Network Exception: Network devices break this pattern. In Linux, network interfaces are not accessed through /dev/ paths. Instead, they use the socket API β socket(), bind(), sendto(), recvfrom(). The orange-highlighted WiFi adapter in the diagram shows this exception. The socket API exists because networking did not fit the file model cleanly: connections are stateful, bidirectional, and multiplexed across many remote endpoints. This is a real limit of "everything is a file," and understanding where abstractions break down is just as valuable as understanding where they hold.
ASAP Error Handling
This is a subtle but critical point. Modern error protocols (SCSI sense data, NVMe completion status codes) propagate detailed, structured error information all the way to the kernel driver. The CPU is not blind to what went wrong. What narrows at each hop is the set of available physical actions to fix it.
Consider a read error on an HDD. At the platter/head level, the firmware can retry the read, attempt a head offset retry at a slightly different track position, apply multi-level ECC (inner code, then outer Reed-Solomon), or remap the sector to a spare area. At the drive controller, the options shrink: retry with ECC correction, remap the sector, or adjust read timing. At the PCH's SATA controller, all that remains is reissuing the command or resetting the link. By the time the error reaches the CPU driver, the only options are "retry the I/O request," "reset the device," or "fail to the filesystem." The driver can say "retry" β but it cannot say "recalibrate the read head." That action only existed at the functional hardware level, hidden behind the controller abstraction boundary.
ASAP Error Handling
Handle errors as close to the source as possible β that is where the richest set of remediation actions exists. Step through each hop to watch the available physical fixes shrink.
Available at Hop 1 β NAND / Platters
Retry read
Head offset retry
Multi-level ECC
Remap sector
Report upward
Step through the hops and watch the action list shrink. The struck-through items in the "no longer available" panel show the physical remediation tools that have been lost β not because the information disappeared, but because those tools only exist at lower levels.
π‘ Connection to Device Independence: This ties directly to Principle 1. The same controller abstraction that enables device independence is the boundary that hides device-specific remediation actions from the host. The OS gains generality but loses the ability to command physical fixes. That tradeoff is exactly why errors should be handled as close to the source as possible.
Synchronous Appearance, Asynchronous Operation
When you call read(fd, buf, 4096) in a C program, the call blocks β your code does not continue until the data is ready. From the programmer's perspective, this is a synchronous, sequential operation. But at the hardware level, the situation is very different: the CPU issues a command to the device, the OS suspends your process, the CPU switches to running a completely different process, and the device operates independently for potentially millions of clock cycles. When the device finally fires an interrupt, the OS wakes your process and the read() call returns as if nothing happened.
Synchronous Appearance, Async Reality
I/O is fundamentally asynchronous at the hardware level β the device operates independently while the CPU runs other work. The OS wraps this in blocking system calls so programmers write sequential code.
Hardware Reality
Program View
> read(fd, buf, 4096)
CPU issues I/O command to SATA controller
I/O is asynchronous. The OS makes it look synchronous.
Watch the hardware timeline on the left and the terminal on the right. The key moment is steps 2-3: the device operates on its own while the CPU runs entirely different code. Your program sees none of this β it just sees read() block and then return. The OS has hidden the asynchronous hardware reality behind a synchronous programming model.
π€ Why Not Just Expose the Asynchrony? You can β that is exactly what select(), poll(), epoll, io_uring, and async/await are for. But those APIs are significantly more complex. The synchronous model is the default because it is simple, composable, and sufficient for most programs. Advanced I/O concurrency is an opt-in complexity, not a baseline requirement.
Buffering
Consider the SATA SSD path in the PCH diagram. The SSD's internal flash reads at roughly 600 MB/s. The SATA cable also runs at 600 MB/s. But the DMI link between the PCH and the SoC runs at roughly 4 GB/s, and the memory bus between the MMU and DRAM runs at roughly 50 GB/s. These wildly different speeds mean that data arriving from the device must be buffered at each hop before it can be forwarded β otherwise the slow end starves the fast end, or the fast end overwhelms the slow end.
Buffering
Buffers at each hop absorb speed differences between adjacent stages. Hardware controllers use on-chip FIFOs; the OS adds software buffers (page cache, user-space) above the abstraction boundary.
Press Play and watch the buffer fill levels animate. Each buffer fills as data arrives from the previous hop, then drains as data moves to the next. The rate labels between hops show the speed mismatches that the buffers absorb.
π‘ Key Insight: Buffering is not just a performance optimization β it is a correctness requirement. Without buffers, a character device producing data continuously would have no place to store bytes between the moment they arrive and the moment the CPU reads them. The bytes would simply be lost. Buffers decouple the producer's timing from the consumer's timing.
Shareable vs. Dedicated Devices
Most modern I/O devices are shared. SSDs and HDDs accept concurrent read/write requests from multiple processes via command queuing (NCQ for SATA, multiple submission queues for NVMe). Network adapters multiplex connections from many processes simultaneously. GPUs support hardware preemption and context switching β your desktop compositor, browser, and game all share the GPU concurrently through a kernel-level scheduler. Sound cards mix multiple audio streams through the OS audio stack (PulseAudio/PipeWire on Linux, Core Audio on macOS).
Truly dedicated devices are those where the hardware physically cannot interleave operations: a printer processes one print job at a time, a scanner performs one scan at a time, a CD/DVD burner locks the disc during a burn, and a tape drive provides sequential access to a single consumer. For these devices, the OS must serialize access β typically through a spooler (a queue that accepts requests and feeds them to the device one at a time) or locking (granting exclusive access to one process).
Shareable vs. Dedicated
Most I/O devices are shared β they queue or interleave concurrent requests. Truly dedicated devices require the OS to serialize access through spooling or locking.
Shared Device
SSDGPUSound CardNIC
Dedicated Device
PrinterScannerBurnerTape Drive
Three processes request I/O simultaneously
All devices on the PCH diagram blink to indicate sharing β they all serve multiple consumers concurrently. The dedicated examples below the diagram (printer, scanner, burner, tape drive) are devices where the OS must impose serialized access because the hardware cannot interleave.
π Historical Note: The print spooler is one of the oldest OS abstractions β the name stands for Simultaneous Peripheral Operations On-Line, dating back to the 1960s. It solved the problem of dedicated printers on timesharing systems: multiple users could submit print jobs, and the spooler would feed them to the printer one at a time without requiring each user to wait.
Putting It All Together
These six principles do not operate in isolation. They are layered on top of each other, all acting on the same hardware simultaneously. The visual below shows all six overlaid on the PCH diagram with distinct color coding.
All Principles Combined
The full hardware architecture β buffers at every hop and sharing properties at each device.
Every device in the system is simultaneously subject to all six principles: it is accessed through a device-independent API, named through a uniform path, handles errors close to the source, appears synchronous to applications, uses buffering at every hop, and is either shared or dedicated. Understanding these principles gives you a mental framework for reasoning about any I/O subsystem β and for understanding the design decisions in the code we will write next.
I/O Programming Techniques
We now have the hardware landscape from LN17 and the software design principles from above. The remaining question is concrete: how do we actually structure our I/O code? We studied polling, interrupts, and DMA as hardware mechanisms last lecture. Now we reframe them as programming models β what does each mechanism mean for the code a programmer writes?
The answer depends on how much hardware is available to help. Three techniques form a spectrum along two axes: who waits for readiness and who transfers the data.
Who waits for readiness?
Who transfers data?
Programmed I/O
CPU (busy-wait)
CPU (load/store per word)
Interrupt-Driven I/O
Device signals CPU (interrupt)
CPU (load/store per word)
DMA-Driven I/O
DMA engine
DMA engine
Each step down offloads one more responsibility from the CPU to dedicated hardware. The subsections below flesh out what each row means for the programmer β what you manage, what you delegate, and what can go wrong.
Programmed I/O
Programmed I/O is the simplest and cheapest approach. There is no interrupt controller to configure, no DMA engine to program, no asynchronous handlers to write. The code is a straightforward loop: check the status register, copy a word, repeat. This makes it ideal for embedded systems where every transistor counts β a microcontroller reading a temperature sensor does not need interrupt hardware or a DMA engine. It just needs a loop and a register address.
The cost is equally straightforward: the CPU is fully occupied for the entire transfer. It cannot run other processes while it busy-waits for the device, and it cannot delegate the data movement to anyone else. On a complex system with many devices competing for attention, this monopolizes the processor. A multi-megabyte disk read under programmed I/O would burn millions of CPU cycles on nothing but copying β cycles that could have been spent running user programs.
π‘ Connection to LN17: This is exactly the polling model we studied in the interrupts-vs-polling section β but extended to include the data transfer itself. In programmed I/O, the CPU does not just poll for readiness; it also performs every load/store to move the data. The "synchronous appearance" principle from earlier in this lecture is irrelevant here β there is no illusion. The programmer is living the synchronous, blocking reality directly.
Interrupt-Driven I/O
The key distinction from programmed I/O is what gets offloaded: interrupt-driven I/O offloads waiting from the CPU, but not data transfer. Between interrupts, the CPU is free β the OS can schedule a completely different process. But every time the device has a word ready, it fires an interrupt, the CPU enters the handler, the handler moves the word, and the CPU returns. The programmer handles device resource management (opening, closing, allocating buffers) and writes the interrupt handlers that perform the per-word data movement.
This aligns directly with the synchronous appearance, asynchronous operation principle. The OS can wrap these interrupts in a blocking read() call β the application sees a synchronous function that returns when all the data is ready, while underneath, the CPU was doing other work between interrupts.
The cost is interrupt overhead. Each interrupt triggers a partial context switch: the CPU saves registers, jumps to the handler, executes the transfer, restores registers, and returns. When devices are slow and interrupts are infrequent, this overhead is negligible. But a high-speed device producing data rapidly can cause an interrupt storm β the CPU spends more time entering and exiting interrupt handlers than doing useful work. At the extreme, interrupt overhead can exceed the busy-wait cost of programmed I/O.
β οΈ Gotcha: The gap between interrupt-driven and programmed I/O is narrower than it first appears. Interrupt-driven I/O frees the CPU from waiting, but the CPU is still invoked for every word of data movement. For a 4 KB block transfer, that is potentially thousands of interrupts and thousands of handler invocations. This is the critical gap that motivates DMA β offloading not just the waiting, but the transfer itself.
DMA-Driven I/O
DMA-driven I/O offloads both axes: the DMA engine waits for the device and moves the data. The programmer's interface is reduced to "here is my buffer, here is what I want, tell me when it is done." The CPU is free for the entire duration of the transfer β not just between interrupts, but from start to finish. Instead of thousands of interrupts for a block transfer, there is one.
This provides the simplest programming model and the highest hardware utilization, but it requires the most hardware. A DMA engine is essentially a special-purpose processor with its own ability to read and write the bus, manage address counters, and coordinate with device controllers. Modern bus-mastering devices (NVMe SSDs, GPUs, high-end NICs) embed their own DMA engines directly on the device, making them even more autonomous β they process descriptor rings, perform scatter/gather across non-contiguous memory regions, and signal completion via MSI/MSI-X, all without CPU involvement.
The cost is the hardware itself. DMA engines add silicon area, power consumption, and design complexity. On embedded systems where cost and power are primary constraints, a DMA controller may not be justified for low-bandwidth devices. There is also a subtler risk: if the DMA engine sits on a slower bus path than the CPU's direct access, it becomes a bottleneck rather than an optimization β the transfer takes longer through the DMA engine than it would have with the CPU copying directly.
π Historical Note: This bottleneck was a real concern with early PC DMA. The Intel 8237 DMA controller (1981) shared the ISA bus with the CPU, and its transfer rate was limited by the bus clock. For small transfers on a fast CPU, programmed I/O through the processor was actually quicker than delegating to the 8237. Modern bus-mastering DMA over PCIe largely eliminates this problem β the device has its own high-bandwidth path to memory β but understanding the tradeoff explains why DMA is not always the automatic winner.
Head-to-Head Comparison
Property
Programmed I/O
Interrupt-Driven I/O
DMA-Driven I/O
Waits for readiness
CPU (busy-wait loop)
Device interrupt
DMA engine
Transfers data
CPU (per-word load/store)
CPU (per-word in handler)
DMA engine (autonomous)
CPU during transfer
Fully occupied
Free between interrupts, busy per-word
Free for entire transfer
Interrupts to CPU
None (polls instead)
One per word/byte ready
One per complete transfer
Hardware required
Device controller only
Controller + interrupt hardware
Controller + interrupt + DMA engine
Programmer burden
Full (poll, transfer, cleanup)
Medium (resource mgmt, handlers)
Minimal (data + high-level request)
Cost
Cheapest
Moderate
Most expensive
Best for
Embedded, ultra-simple devices
Standard devices, moderate throughput
High-throughput bulk transfers
Risk
CPU monopolized by busy-wait
Interrupt storms at high frequency
DMA bottleneck (historical/embedded)
The Hybrid Reality
These three techniques are mental models on a spectrum, not mutually exclusive choices. A modern operating system uses all three simultaneously, choosing the right technique for each device and workload:
Polling with busy-waiting for ultra-low-latency paths where interrupt overhead exceeds the poll cost β high-performance NVMe drivers use polling mode for latency-critical submission queues
Interrupt-driven requests for standard, moderate-frequency I/O β keyboard input, mouse events, serial communication, and most file operations use interrupt notification
DMA transfers for high-throughput bulk data movement β disk reads, network packet rings, GPU texture uploads, and audio streams all flow through DMA engines
The decision chain from the beginning of this lecture applies here too. When you encounter a new I/O problem, walk the two-axis table: given the hardware available and the performance requirements, who should wait for readiness, and who should move the data? The answer might be "the CPU does both" for a simple sensor, "interrupts free the CPU but it still moves data" for a moderate device, or "DMA handles everything" for a high-bandwidth stream. Often, the answer is a combination β and that is the hybrid reality of real systems.
π‘ Key Insight: The three techniques are not a historical progression where each one replaces the last. They are concurrent tools in every modern system. The same kernel that uses DMA for disk I/O uses interrupts for keyboard input and polling for high-frequency NVMe queues. Choosing the right technique β or the right combination β is an engineering decision driven by the device's speed, the transfer size, and the hardware budget.
Device Drivers
We have spent the last two sections building two worlds: the diverse, messy reality of I/O hardware, and the clean, principled architecture that software demands. Drivers are the bridge between them. A driver is the executable translation of a device manufacturer's manual β it knows the register layout, timing constraints, and protocol quirks that the OS should never need to learn.
Think of it this way: the manufacturer ships the device with a thick manual full of register maps, command sequences, and timing diagrams. The driver author reads that manual and writes a program that hides every hardware-specific detail behind the OS's standard interface. Once the driver is loaded, the rest of the kernel β and every application above it β speaks only the standard interface and never touches the manual.
Where Drivers Live
Recall the L-shaped abstraction boundary from the combined principles diagram. The driver lives exactly at that line. Below the boundary, it speaks the hardware's native language β writing to control registers, reading status bits, configuring DMA descriptors, and handling device-specific interrupts. Above the boundary, it speaks the OS API β implementing the standard function pointers that the kernel calls when an application issues read(), write(), or ioctl().
This is why we emphasized that boundary so heavily in the principles section. Every principle we studied β device independence, uniform naming, error handling, synchronous appearance, buffering, shareable vs. dedicated β is ultimately implemented by the driver. The principles describe what the software architecture demands; the driver is the code that delivers it.
Why Drivers Are Per-OS
Despite all the work to abstract and generalize, drivers must be handmade for every OS whose internal architecture differs. The device hardware stays the same β the register map does not change. But the OS-side contract varies: the syscall interface, memory model, interrupt registration mechanism, and scheduling guarantees are all different between Linux, Windows, macOS, and embedded RTOSes.
This is the same problem you encounter in web development when you redesign the connection between a frontend and a backend after the client-side architecture changes significantly. The API endpoints stay the same, but the data binding, state management, and rendering pipeline are all different β so the integration layer must be rewritten.
π Key Point: Drivers are the single largest source of kernel bugs in most operating systems. Studies of the Linux kernel have found that driver code contains 3β7 times more bugs per line than core kernel code. This is not because driver authors are less skilled β it is because drivers must handle the full complexity of real hardware while conforming to the kernel's internal API, a dual-interface burden that no other kernel subsystem shares.
The Driver's Responsibilities
A driver's job spans the full lifecycle of device interaction:
Initialization β Detecting the device, allocating resources (IRQ lines, DMA buffers, I/O regions), and registering with the kernel
Data transfer coordination β Moving data between user-space buffers and device registers, using programmed I/O, interrupt-driven I/O, or DMA depending on the device
Interrupt handling β Responding to hardware interrupts, reading device status, and waking any processes that were waiting for I/O completion
Error reporting β Translating hardware-specific error codes into the OS's generic error values
Shutdown and cleanup β Releasing resources, deregistering from the kernel, and leaving the device in a safe state
Principles in Practice
We introduced the six I/O software principles as conceptual design constraints. Now we shift perspective: what do these principles look like in real driver code? Using Linux as our example, each principle maps to specific APIs, data structures, and programming patterns that every driver author must know.
The pseudocode in this section is simplified C. We strip away error-checking boilerplate and kernel-internal details to focus on the key API calls and their roles. The goal is recognition β when you encounter these patterns in real driver source, you will know which principle they implement.
Device Independence in Practice
The kernel's Virtual File System (VFS) layer dispatches system calls to the correct driver. When an application calls read(fd, buf, n), the VFS looks up which driver owns fd and calls that driver's .read function pointer. The application never knows which driver answered β it just gets data back.
Every character device driver fills in one of these structs. The VFS does not care what hardware sits behind it β it only cares that the function pointers are valid. This is device independence reduced to a data structure.
β οΈ Gotcha: Block devices (disks, SSDs) actually use a separate struct block_device_operations and go through the block layer β request queues, I/O schedulers, and sector-aligned transfers β rather than the VFS character-device path. Our pseudocode focuses on character devices because their file_operations pattern is the clearest illustration of device independence. Block devices achieve the same goal through a parallel mechanism.
Uniform Naming in Practice
When a driver loads, it registers itself with the kernel and claims a major number. From that point on, any device file in /dev/ with that major number will have its I/O operations routed to that driver. The minor number lets the driver distinguish between multiple instances β for example, /dev/sda (first SATA disk) vs. /dev/sdb (second SATA disk) under the same SCSI driver.
register_chrdev(240, "mydriver", &my_fops);
This single call tells the kernel: "I am the driver for major number 240. Here are my function pointers." Now any device file created with major 240 will have its read, write, open, and release calls forwarded to my_fops.
Creating a device file that maps to this driver:
mknod /dev/mydevice c 240 0
This creates a character device file (c) at /dev/mydevice with major 240 and minor 0. When an application opens this file, the kernel routes the operation to our driver. You can see real major/minor numbers on any Linux system:
ls -l /dev/sda
# brw-rw---- 1 root disk 8, 0 ...# ^ ^# major minor
This is the programmatic reality behind the /dev/ paths we showed in the uniform naming diagram. Every path maps to a (major, minor) pair, and every major number maps to a driver.
π Key Point:register_chrdev is the simplified, legacy API. Modern Linux drivers use alloc_chrdev_region to dynamically allocate a major number (avoiding collisions), then cdev_init + cdev_add to register the device. We use the older form here because it captures the concept in a single call β but if you look this up in the kernel docs, you will see the newer three-step API recommended instead.
π‘ Key Insight: On a real Linux system, you almost never run mknod by hand. A kernel subsystem called devtmpfs and a user-space daemon called udev automatically create and remove /dev/ entries when drivers register or unregister devices. Plug in a USB drive and /dev/sdb appears instantly β that is udev at work. The mknod command explains what a device node is; udev explains how they appear in practice.
ASAP Error Handling in Practice
When a device encounters an error, the information flows upward through a chain: the hardware sets bits in its status register, the driver reads those bits and interprets them, the driver maps the device-specific error to a Linux errno value, and the system call returns that errno to user space. At each level, the handler attempts whatever remediation is available before passing the error up.
ssize_tmy_read(struct file *f, char __user *buf,
size_t n, loff_t *off) {
int status = ioread32(dev->status_reg);
if (status & ERROR_BIT) {
iowrite32(RETRY_CMD, dev->cmd_reg);
status = ioread32(dev->status_reg);
if (status & ERROR_BIT)
return -EIO;
}
copy_to_user(buf, dev->data_reg, n);
return n;
}
The driver checks the device status register. If an error is present, it attempts a retry β the one hardware-level action it can still command from the driver level. If the retry fails, it returns -EIO (generic I/O error). At this point, all hardware-specific remediation has been exhausted. The application receives errno: EIO and decides whether to retry the read, try a different file, or report the failure to the user.
This is exactly the moment we animated in the combined diagram's Runtime Behavior tab: the terminal showed errno: EIO β Input/output error. That is the driver's final decision β all lower-level fixes failed.
π Key Point: "Coarse" is relative. An NVMe controller might report dozens of specific status codes β data integrity error, unrecoverable read error, namespace not ready, command abort requested. The driver maps these down to a handful of errno values (EIO, ENXIO, EBUSY, ETIMEDOUT) because the application does not need to know which NAND page failed. It only needs to know whether to retry, abort, or try a different file. The errno is not imprecise β it is appropriately scoped for the level of the stack that receives it.
Synchronous Appearance in Practice
The synchronous-looking read() call that applications use is literally implemented by the driver putting the calling process to sleep and having the interrupt handler wake it up. Two functions work together: the .read function that the application's read() syscall dispatches to, and the interrupt handler that the hardware triggers when data is ready.
The .read function issues a command to the device, then calls wait_event_interruptible. This puts the calling process to sleep β the scheduler removes it from the run queue and runs other processes instead. The CPU is free. When the device completes the operation, it fires an interrupt. The interrupt handler sets a flag and calls wake_up_interruptible, which puts the sleeping process back on the run queue. The .read function resumes, copies the data to user space, and returns.
From the application's perspective, read() blocked and then returned with data. From the hardware's perspective, the device operated asynchronously for potentially millions of clock cycles. The driver is the code that bridges these two realities.
π‘ Key Insight: This is the "synchronous appearance, asynchronous operation" principle reduced to its implementation. The driver is the exact place where sleep and wake happen. Every blocking I/O call you have ever used β read, write, recv, accept β follows this same pattern inside its driver.
Buffering in Practice
The standalone buffering diagram earlier showed single-buffer speed matching β one buffer absorbing speed differences between hops. Double buffering solves the remaining gap: when a single buffer is being drained, no new data can arrive. With two buffers, filling and draining happen simultaneously.
char buf[2][BUF_SIZE];
int current = 0;
while (transferring) {
fill_from_device(buf[current]);
drain_to_user(buf[1 - current]);
current = 1 - current;
}
The pattern is simple: while the consumer drains buf[0], the producer fills buf[1]. Then they swap. Neither side ever waits for the other. This is the principle behind audio playback buffers, video frame buffers, and network packet rings.
Circular (ring) buffers generalize this further β instead of two buffers, you have N slots in a ring. The DMA descriptor rings from LN17 are exactly this pattern: the device writes to the next slot while the driver reads from a previous one, with head and tail pointers chasing each other around the ring.
At the OS level, the page cache is Linux's primary buffer between user-space read() calls and block devices. When you read a file, the data goes: device β DMA buffer β page cache β user buffer. Subsequent reads of the same data hit the page cache and skip the device entirely.
β οΈ Gotcha: The page cache applies to block devices (disks, SSDs) β these are the devices where caching file data makes sense. Character devices (serial ports, keyboards, sensors) typically bypass the page cache entirely and use copy_to_user to move data directly from a driver-managed buffer into the application's memory. When we say "the OS buffers your I/O," the mechanism differs depending on the device class.
Shareable vs. Dedicated in Practice
For shared devices, the driver accepts concurrent requests and relies on hardware queuing to interleave them. An NVMe SSD, for instance, exposes multiple submission queues β processes submit I/O requests independently, and the SSD's controller schedules them internally. The driver does not need to serialize access; the hardware handles it.
For dedicated devices, the driver itself enforces exclusivity. The .open function checks whether the device is already in use and rejects additional openers:
When a second process tries to open a dedicated device that is already claimed, it receives errno: EBUSY β Device or resource busy. The first process holds exclusive access until it closes the file descriptor, at which point my_release clears the flag and the device becomes available again.
This works for simple cases, but it forces every would-be user to handle rejection and retry logic. For devices where queuing and prioritization matter β like printers β a better solution exists at a higher level: spooling.
User-Space Spooling
The printer spooler is the canonical example. On Linux, the cupsd daemon manages the print queue: multiple users and applications submit print jobs, cupsd queues them with priorities and ordering, and feeds them to the printer one at a time. The printer driver itself only needs to handle a single job β it never sees concurrent requests, because the spooler serializes them before they reach the driver.
This is a cleaner separation of concerns than driver-level locking. The driver stays simple (dedicated device, one job at a time). The spooler handles the complex, application-level logic: job prioritization, user permissions, retry on paper jam, notification of completion. These are concerns that do not belong in the kernel.
Spooling is the final layer of the I/O software stack:
Layer
Example
Responsibility
Application
Word processor
Creates the data to print
User-space spooler
cupsd
Queues, prioritizes, and serializes jobs
Kernel driver
Printer driver
Translates one job into device commands
Hardware
Printer controller
Executes the physical print operation
π‘ Key Insight: Spooling is not limited to printers. Any dedicated device benefits from user-space queuing β CD/DVD burning software, fax servers, and batch-processing systems for scientific instruments all use the same pattern. The common thread is a device that cannot interleave, paired with users who should not have to coordinate among themselves.
Summary
The I/O decision chain unifies LN17's hardware topics into a three-stage reasoning tool: what does the hardware naturally produce (discrete or continuous), how does the controller package it (block or stream protocol), and how does the system detect readiness (polling or interrupts)
Six software principles form the foundation of every I/O subsystem: device independence, uniform naming, ASAP error handling, synchronous appearance with asynchronous operation, buffering, and shareable vs. dedicated access
These principles are not implementation details β they are design constraints that shape I/O code from Linux to Windows to embedded RTOS kernels
Three I/O programming techniques form a spectrum along two axes: who waits for readiness, and who transfers data
Programmed I/O β the CPU handles everything (busy-wait + per-word transfer); simplest, cheapest, but monopolizes the processor
Interrupt-driven I/O β interrupts free the CPU from waiting, but the CPU still performs per-word data transfer in each handler; susceptible to interrupt storms at high frequency
DMA-driven I/O β dedicated hardware handles both waiting and data transfer, sending a single interrupt on completion; simplest programmer interface but most expensive in hardware
Modern systems use all three techniques simultaneously, choosing the right one (or combination) based on device speed, transfer size, and hardware budget
A device driver is a kernel-level program that lives at the abstraction boundary β it speaks the hardware's native language below and the OS's standard API above
Drivers must be rewritten for every OS whose internal architecture differs, even though the device hardware stays the same
In Linux, struct file_operations is the mechanism behind device independence β every character device driver fills in the same set of function pointers
Major and minor device numbers implement uniform naming: the major number routes to a driver, the minor number identifies the instance
Drivers implement ASAP error handling by attempting hardware-level remediation (retries) before mapping device-specific errors to coarse errno values
Synchronous appearance is implemented by the driver calling wait_event to sleep the process and the interrupt handler calling wake_up to resume it
Double buffering eliminates single-buffer idle time by letting the producer fill one buffer while the consumer drains the other; ring buffers generalize this to N slots
Dedicated devices enforce exclusivity through a driver-internal in_use flag that returns -EBUSY on concurrent open attempts
Spooling moves serialization logic for dedicated devices out of the kernel and into user-space daemons like cupsd, separating queuing concerns from driver logic
π Lecture Notes
Key Definitions:
Term
Definition
Device Independence
Applications read/write any device through a uniform syscall API without knowing the underlying hardware
Uniform Naming
Devices are identified by consistent names (filesystem paths in Unix) independent of physical hardware
ASAP Error Handling
I/O errors should be handled as close to the source as possible, where the most remediation options exist
Synchronous Appearance
The OS wraps asynchronous hardware I/O in blocking system calls so programmers write sequential code
Buffering
Temporary storage at each hop absorbs speed differences between producer and consumer
Shareable vs. Dedicated
Shareable devices serve multiple processes concurrently; dedicated devices require serialized access via spooling or locking
Programmed I/O
CPU handles everything: busy-waits for readiness, transfers data per-word via load/store, manages cleanup
Interrupt-Driven I/O
Device signals CPU via interrupt when ready; CPU is freed from waiting but still transfers data per-word in the handler
DMA-Driven I/O
DMA engine handles both readiness detection and data transfer autonomously; CPU receives one interrupt per complete transfer
Device Driver
A kernel-level program that translates between the OS's generalized I/O interface and a specific device controller's hardware API
file_operations
The struct of function pointers (read, write, open, release, ioctl) that every Linux character device driver implements
Major/Minor Numbers
Major number identifies the driver; minor number identifies the specific device instance managed by that driver
Double Buffering
Two alternating buffers where the producer fills one while the consumer drains the other, eliminating single-buffer idle time
Spooling
Simultaneous Peripheral Operations On-Line β a user-space daemon queues jobs and feeds them to a dedicated device one at a time
I/O Programming Techniques Comparison:
Property
Programmed
Interrupt-Driven
DMA-Driven
Who waits
CPU (busy-wait)
Device interrupt
DMA engine
Who transfers
CPU (per-word)
CPU (per-word in handler)
DMA engine
CPU utilization
Fully occupied
Free between interrupts
Free for entire transfer
Hardware cost
Cheapest
Moderate
Most expensive
Risk
CPU monopolized
Interrupt storms
DMA bottleneck
π Additional Resources
Recommended Reading
OSTEP Chapter 36: I/O Devices β Free textbook covering device interfaces, polling, interrupts, DMA, and the programmed/interrupt-driven/DMA progression