In this lecture we investigate different memory hardware technologies, how they are used, their advantages, and their disadvantages to set up a foundation for understanding Processes fit into memory. This includes volatility, speed, size, cost, and more!
In LN12, we wrestled with deadlocks — what happens when processes fight over resources so badly that everyone freezes. That lecture marks the end of our deep dive into the CPU and how the OS manages computation on it.
Today, we shift from the CPU to the other fundamental piece of every computer: memory. Where the CPU performs computations, memory stores them. Every instruction, every variable, every process — it all lives in memory. But "memory" is not one thing. It is a hierarchy of wildly different technologies, each with its own physics, speed, cost, and tradeoffs.
This lecture is about the hardware of memory. We will look at what memory actually is, what physical principles it exploits, and why modern systems use so many different kinds of it. In future lectures, we will investigate how the OS manages this hardware. But first, you have to understand what you are managing.
Today's Agenda
What Memory Does — The fundamental role of storage in computation
Volatile Memory — SRAM and DRAM: fast but fleeting
Non-Volatile Memory — HDDs, SSDs, ROM, and the physics of persistence
Sleep Modes and Volatility — What stays powered and what doesn't
Semi-Volatile Memory — Technologies bridging the gap
The Memory Hierarchy — Why we need so many kinds of memory
The Universal Memory — The dream of one perfect storage technology
Memory Through Abstraction Layers — How your view changes depending on where you sit
Memory Across Hardware — Comparing real-world devices
What Memory Does
Color Legend:VolatileNon-VolatileSemi-VolatileSpeedCostHardwareLogicalPhysical
At the most fundamental level, memory provides the means by which we store all our computations and information. The CPU computes, but it needs somewhere to put the results, somewhere to read instructions from, and somewhere to hold the data being worked on. Memory is that "somewhere."
But here is the critical insight: there is no single "perfect" memory. If there were, our jobs would be much simpler. Instead, we have a spectrum of technologies, each making different tradeoffs between speed, capacity, cost, and one property that is more fundamental than all of these:
This single distinction — whether the hardware needs power to remember — shapes everything about modern computer architecture. Let's begin by examining each category.
Volatile Memory — Data at Risk
Volatile memory is hardware that requires constant power to store information. The moment you cut power, everything stored in it vanishes. This sounds like a terrible property — why would anyone want this? Because volatile memory is fast. Extremely fast.
SRAM — Static Random Access Memory
The word "static" in SRAM does not mean "unchanging data" — it means the circuit is statically stable while powered. Unlike DRAM, SRAM does not need refresh cycles. The tradeoff is density: SRAM stores fewer bits per unit area than DRAM, so it is much more expensive for large memories.
This is why SRAM is used only where speed is absolutely critical: the CPU's registers (the fastest memory in any system, operating at a single clock cycle) and the cache hierarchy (L1, L2, L3). Your CPU die is actually mostly cache — look back at those die shots from LN9 and notice how much silicon is dedicated to it.
DRAM — Dynamic Random Access Memory
The word "dynamic" refers to the fact that the stored charge must be maintained over time. The memory controller periodically refreshes rows so the charge does not decay away. Refresh costs time and bandwidth, but DRAM's density is so much better than SRAM that it remains the standard choice for main memory.
Why use DRAM at all if it has this problem? Because it packs far more bits into the same area at a much lower cost. That makes it suitable for main memory — the RAM sticks or soldered memory packages that hold active programs and data.
Key Insight: SRAM vs. DRAM is the first taste of the central tension of memory engineering: speed vs. density vs. cost. You will see this tradeoff at every level of the hierarchy. It is inescapable.
Non-Volatile Memory — Data That Persists
Non-volatile memory retains its data when unpowered. This is generally what we think of when we imagine "storage" on a system. The physics behind persistence varies by technology — and each approach has profound implications for speed, durability, and cost.
ROM — Read-Only Memory
Despite the name "read-only," modern variants like EEPROM (Electrically Erasable Programmable ROM) can be rewritten — but not casually by the system during normal operation. Updating your BIOS firmware is an example of writing to ROM, and it requires a special "flashing" process precisely because the hardware is designed to resist normal writes. This makes it perfect for code that should never accidentally change, like the instructions your CPU executes before the OS even loads.
Flash Memory — The Modern Workhorse
The NAND/NOR naming is not coincidental — recall from your logic design and electrical engineering courses that NAND and NOR gates are universal gates. Either one alone can construct any other logic gate. This universality made it economically viable to mass-manufacture only one type of gate and use it to build everything, at the cost of needing more gates per circuit.
Flash memory exploits a floating-gate transistor structure where electrons are trapped in an insulating layer. Charge = 1, no charge = 0 (or vice versa, depending on convention). Writing requires relatively high voltages to force electrons through the insulator. Reading is fast, but writing is slower and physically stresses the insulating layer — which is why flash storage has a limited number of write cycles before cells wear out. The controller chip handles wear leveling (distributing writes evenly), garbage collection, and TRIM operations to maximize the lifespan.
Samsung and other manufacturers market SSDs with different endurance tiers based on how many bits are packed per cell:
Type
Bits/Cell
Endurance
Speed
Cost
SLC (Single-Level Cell)
1
Highest (~100K cycles)
Fastest
Most expensive
MLC (Multi-Level Cell)
2
High (~10K cycles)
Fast
Moderate
TLC (Triple-Level Cell)
3
Moderate (~3K cycles)
Moderate
Affordable
QLC (Quad-Level Cell)
4
Lowest (~1K cycles)
Slowest flash
Cheapest
More bits per cell means higher density and lower cost — but also lower endurance and slower writes. This is the speed/density/cost tradeoff again, just within a single technology category.
Magnetic Memory
Magnetic memory stores data by manipulating the polarity of a magnetic medium. The read/write head magnetizes tiny regions of a spinning platter (HDDs), a flexible strip (floppy disks), or a long reel (tape drives) in one of two directions to represent 0 or 1.
This is one of the oldest data storage technologies — magnetic tape was used for computing in the 1950s and is still in use today for enterprise backup and cold archival storage. HDDs (Hard Disk Drives) spin metal platters at 5400–7200 RPM while a mechanical arm seeks to the correct track. The mechanical nature of HDDs is why they are orders of magnitude slower than SSDs for random access — the head physically has to move to the right spot on the platter.
Optical Memory
Optical memory stores data by physically altering the surface of a reflective disc using a laser. CDs, DVDs, and Blu-ray discs all work on this principle. The term "burning" a CD is literal — a laser burns microscopic pits into a dye layer on the disc surface. A lower-power laser then reads these pits by detecting how light reflects differently from pitted vs. unpitted areas.
Format
Capacity
Laser Wavelength
CD
700 MB
780 nm (infrared)
DVD
4.7-8.5 GB
650 nm (red)
Blu-ray
25-128 GB
405 nm (blue-violet)
Shorter wavelengths focus more tightly, allowing smaller pits and higher density. This is literally why Blu-ray is called "Blu-ray" — it uses a blue laser instead of a red one.
The Physics of Persistence
To truly understand why non-volatile memory retains data without power, consider what physical phenomenon each technology manipulates:
Technology
Physical Basis
Magnetic (HDD, Tape)
Magnetic polarity of ferromagnetic material — domains are oriented N/S and stay that way
Optical (CD, DVD, Blu-ray)
Physical surface topology — laser-burned pits in a reflective medium are permanent
Flash (SSD, USB)
Trapped electrons in a floating-gate insulator — charge is retained without power
ROM
Hardwired circuit paths or permanently blown fuses
Volatile memory, by contrast, relies on active electrical state: SRAM uses powered flip-flop circuits; DRAM uses capacitors that leak charge. The moment current stops flowing, the state is gone.
Sleep Modes and Volatility
Now that we understand what volatile and non-volatile mean physically, let's connect it to something you experience every day.
Thought experiment: Imagine a laptop built with only non-volatile memory for everything — registers, cache, RAM, storage, all non-volatile. What happens when you shut it off and turn it back on? Nothing changes. Your system would simply stop mid-computation, and when powered back on, it would pick up exactly where it left off. No boot sequence, no OS restart — the bits never went anywhere.
But that's not what happens on real systems, is it? Modern systems use volatile memory for registers, cache, and RAM. When you power off, all of that is gone. The OS has to restart from whatever the non-volatile storage holds — your SSD or HDD. So every power cycle means rebooting the OS, reloading programs, and starting fresh.
What about sleep mode? Here is where the distinction becomes tangible. If you have RGB RAM in your desktop, you can actually see this: when you put your system to sleep, the RAM stays lit because it is still powered. Sleep mode keeps your volatile RAM energized so you can wake up exactly where you were at the application layer. However, your CPU is powered down, so any in-progress computations are lost — applications need to re-enter their run loops, but their data (still in RAM) is immediately available.
Different "sleep levels" correspond to which volatile layers remain powered:
Sleep Level
What Stays Powered
Wake Speed
Power Use
S0 (Active)
Everything
Instant
Full
S1 (Power on Suspend)
CPU + RAM
Near-instant
High
S3 (Suspend to RAM)
RAM only
Fast (~2–5 sec)
Low
S4 (Hibernate)
Nothing (state saved to SSD)
Slow (~10–30 sec)
Zero
S5 (Soft Off)
Nothing
Full boot
Zero
Fun fact — E-Ink and non-volatile displays: This same tradeoff explains why E-Ink screens (used in Kindle and similar e-readers) are so power-efficient. E-Ink is a non-volatile display technology — once the ink particles are moved into position, they stay there without any power. The screen only needs energy when it changes. Traditional LCD/OLED screens are volatile — they need constant power to maintain the image. If we could have non-volatile everything, we would have the "perfect" low-power system.
Semi-Volatile Memory — Bridging the Gap
This category exists because some volatile technologies can be augmented with non-volatile backup mechanisms. For example, some enterprise DRAM modules include a small battery and flash chip — on power loss, the battery provides enough power to flush the DRAM contents to the flash chip, preserving data. When power returns, the data is restored.
The SLC, TLC, and QLC distinctions you see marketed on consumer SSDs also reflect a form of this tradeoff. QLC flash stores 4 bits per cell by distinguishing 16 voltage levels — which means the margins between levels are thin, and data degrades faster. SLC uses only 2 voltage levels (1 bit per cell) with wide margins — the data is far more robust. More redundancy at the cell level translates to better durability, but higher cost per gigabyte.
The Memory Hierarchy
This is the heart of the lecture. Every technology we have discussed occupies a specific position in a hierarchy ordered roughly by latency, capacity, and cost per bit. The closer to the CPU, the faster and smaller. The farther away, the slower and cheaper per bit.
Before you play with the table, compare three columns in order: latency first, then capacity, then cost. The useful pattern is not one exact number; it is how violently those columns diverge.
Memory Technology Hierarchy
Speed scale:
Storage Type ↕
Typical Use
Size ↕
Speed ↕
Cost ↕
Volatility
CPU Registers
Immediate computation storage
Bytes
~1 cycle (~0.3 ns)
●●●●●●●●●●Extremely expensive (built into CPU die)
Volatile
▶CPU Cache (L1/L2/L3)
Fast CPU computation buffer; multicore shared space
KBs – MBs
1–100 cycles (1–30 ns)
●●●●●●●●●○Very expensive (on-die SRAM)
Volatile
ROM
Boot firmware, system microcode
MBs
~100 ns
●●●●○○○○○○Cheap (simple manufacturing)
Non-Volatile
▶RAM (Main Memory)
Active processes and their data
GBs
6–20 ns (DDR5) / ~100 ns (DDR3)
●●●●●●●○○○Moderate–expensive ($3–5/GB for DDR5)
Volatile
▶NVMe SSD
High-speed OS and application storage
GBs – low TBs
10–50 μs
●●●●●●○○○○Moderate ($0.05–0.10/GB)
Non-Volatile
▶SATA SSD
Mid-range long-term storage
GBs – low TBs
50–100 μs
●●●●●○○○○○Affordable ($0.04–0.07/GB)
Non-Volatile
▶HDD
High-capacity bulk storage
High GBs – High TBs
5–10 ms
●●●○○○○○○○Cheap ($0.015–0.03/GB)
Non-Volatile
▶Optical (CD/DVD/Blu-ray)
Media distribution, archival
MBs – GBs
200–300 ms
●●○○○○○○○○Very cheap per disc ($0.01–0.05/GB)
Non-Volatile
▶Tape Storage
Enterprise backup, cold archival
TBs per cartridge
~1+ second (seek)
●○○○○○○○○○Cheapest per GB ($0.004–0.01/GB)
Non-Volatile
Fastest CPU Cycle Ever Recorded
~0.1096 ns (109.6 picoseconds) — achieved by an overclocked Intel Core i9-14900KF at 9,130.33 MHz (9.13 GHz). This is the fastest a single CPU cycle has ever completed.
Human Neuron Comparison
A human neuron fires at 10–100 Hz (max ~1000 Hz for specialized neurons). That is a "clock speed" of ~10 ms per cycle — equivalent to an HDD seek. The brain compensates with massive parallelism: ~86 billion neurons and ~150 trillion synapses, processing ~10 bits/sec consciously.
Where Computer Time Meets Human Time
At tape storage access times (~1 second), computer latency finally reaches human-perceivable timescales. Everything above tape in the hierarchy happens so fast that without the human-scale analogies, we literally cannot appreciate the speed differences. Toggle to the "human" time scale above to feel the gap.
Therefore, the memory hierarchy is less a single ladder and more a compromise stack: no one layer wins on speed, capacity, persistence, and cost all at once.
Understanding the Speed Column
The speed column deserves special attention because the numbers span ten orders of magnitude. A CPU register access takes ~0.3 nanoseconds. A tape seek takes ~1 second. That is a factor of 3 billion. Our brains are not wired to intuitively grasp differences this large, which is why the human time-scale toggle exists in the table above — use it.
A useful mental model: think of the CPU as a video running at some frame rate. The clock speed is that frame rate — it represents the fastest possible transition from one state to the next, the speed of a single CPU cycle. On modern processors, one cycle is roughly 0.3–1 ns.
Registers are the only memory that operates at 1 cycle — they are to the CPU what thoughts are to a brain, instantly accessible. Every other memory tier requires the CPU to wait. And waiting is what we are trying to eliminate.
📚 World Record: The fastest a single CPU cycle has ever completed is approximately 0.1096 ns (109.6 picoseconds), achieved by an overclocked Intel Core i9-14900KF running at 9,130.33 MHz (9.13 GHz). At that speed, a single cycle takes about 110 trillionths of a second.
For contrast, consider the human brain. A typical neuron fires at 10–100 Hz (10–100 times per second), with specialized neurons reaching up to 1000 Hz. That means a neuron's "clock speed" is roughly 1–100 ms per cycle — comparable to an HDD seek time. The brain compensates with massive parallelism: approximately 86 billion neurons connected by ~150 trillion synapses, collectively processing about 10 bits per second consciously. (Recall our parallel programming lecture — the brain is the ultimate embarrassingly parallel system with a hilariously slow clock!)
At tape storage (~1 second access time), computer latency finally enters human-perceivable timescales. Every tier above it happens so fast that without analogies, we cannot appreciate the differences.
Understanding the Cost Column
Cost follows an almost perfect inverse relationship with speed. Register silicon is priceless (it is part of the CPU die itself). Tape is so cheap that companies like Amazon and Google store exabytes of cold data on tape libraries. The cheapest HDD storage runs about $0.015/GB; the fastest NVMe SSDs cost ~$0.10/GB. That is a 7x price premium for roughly a 400x speed improvement — which, in engineering terms, is an incredible deal. This is why SSDs have displaced HDDs for most consumer use cases.
📚 Year-over-Year: This table shifts every year. Just last year, RAM was slower and cheaper. Today, DDR5 prices have surged 200%+ in some configurations, driven by AI companies amassing massive memory pools for large language model training. Staying current on memory economics is part of being a systems engineer.
The Universal Memory
With this hierarchy laid out, it is worth reflecting on how mathematicians and computer scientists view memory. They operate with an idealized model: the Universal Memory.
This might sound like a useless abstraction — a fantasy that ignores all the real-world constraints we just spent an entire lecture studying. But it represents something important about how memory architecture works compared to other fields of computer science.
Memory is a means to an end. If we could afford to have only registers, we would. It is simply that, through practical considerations of space, cost, heat, and other real-world limitations, this field is shaped by physical constraints, not logical ones. There is no computational reason we need a memory hierarchy — no theorem of computation demands it. The halting problem is a logical impossibility; the Universal Memory is an engineering challenge. We march closer every year.
Consider the trajectory: DDR3 RAM in ~2012 ran at ~100 ns access time. DDR5 in 2025 runs at ~6–14 ns. That is a 7–16x improvement in barely a decade. Flash storage went from spinning HDDs to NVMe SSDs, closing the gap to RAM by orders of magnitude. New technologies like HBM3 (High Bandwidth Memory) stack DRAM in 3D to achieve RAM speeds with much higher bandwidth.
The practical consequence is real: the architectures we study today are temporary. When memory gets faster and cheaper, our workarounds become unnecessary. More cache? Needed less if RAM gets closer to cache speed. In-place algorithms to minimize memory footprint? Less critical when memory is abundant. The massive AI models of today are "sloppy" with memory precisely because modern hardware lets them be. Systems engineering is always relative to the hardware of its era.
Memory Through Abstraction Layers
Everything we have discussed so far lives in the real, physical world of voltages, capacitors, and magnetic domains. But as programmers, we rarely think about any of that. The beauty — and the curse — of modern computing is that it is built from layers of abstraction, each hiding the complexity of the layer below.
Memory Through Abstraction Layers
Click a layer to see how memory looks from that perspective. Each layer builds an API wall that hides the complexity below — your optimization ceiling is the layer below you.
High-Level (SQL, Web)
Query & request layer
SELECT * FROM users
WHERE id = 42;
Memory? What memory?
ORM / Driver API
Application Developer
Stack & heap abstraction
Stack
Heap
let x = 5; // stack
let v = vec![1,2,3]; // heap
System Call API
OS / Systems Programmer
64-bit instruction blocks, PAS, ISA
MOV
ADD
CMP
JMP
PUSH
POP
CALL
RET
64-bit instruction blocks
ISA / HAL
Electrical Engineer
Gates, transistors, signals
1011001011010011
Voltage levels → bits
Key insight: Your optimization ceiling is your abstraction layer's floor. You can only optimize for what your layer lets you see.
This layering is powerful but has a critical implication for optimization. Consider the API wall between each layer:
A SQL developer cannot optimize memory layout — they do not even see memory. They see tables and queries.
An application developer in Rust sees the stack and the heap. They can choose Box vs. stack allocation, Vec vs. array, but they do not see cache lines or TLB entries.
An OS developer sees the Process Address Space, page tables, and 64-bit instruction blocks. They can optimize for cache locality and page alignment, but they do not control individual transistors.
An electrical engineer sees voltage levels, gate delays, and capacitor charges. They can optimize the silicon itself.
Your optimization ceiling is your abstraction layer's floor. You can only optimize for what your layer lets you see. When you provide an API to another developer — whether it is a library crate in Rust, a server API for web development, or an interface/trait — you are building a wall made of a promise: that you alone will optimize and support everything hidden behind it. Your users do not need to carry the mental load of what is beyond your abstraction.
The OS is probably the single largest example of this: it takes the raw chaos of hardware and presents a clean, safe interface to every application above it.
The Programmer's View of Memory
Understanding what memory looks like is crucial to your optimizations as software engineers. You can only optimize for what you fully understand, and the abstraction layer we live in — the OS layer — gives us a very specific view.
At the lowest level, we know it is all 1s and 0s flowing through gates. But that is the electrical engineer's view, not ours. The highest view is the application developer's: they only know of the stack and the heap. There are even layers beyond that — SQL developers who cannot control memory at all.
Our view is that of machine code via the ISA. We do not manipulate transistors directly; we reason in addresses, objects, instructions, pages, stacks, and heaps. So treat the vertical tape below as a visual metaphor for ordered address space, not as a literal claim that every instruction on a 64-bit CPU is 64 bits wide. In real ISAs, instruction widths vary.
We group machine code and program data into one process image, then subdivide its address space into familiar regions:
Process Address Space — The 64-Bit Tape
Each row is a 64-bit (8-byte) machine instruction or data word. Click any row to see what it represents. This is what memory looks like from the OS programmer's perspective — a vertical roll of toilet paper, 64 bits wide.
bit 064-bit widebit 63
Low Addresses (0x0000...)
Code (Text)— Compiled machine instructions
0x0000
MOV R1, #0x42
0x0008
CALL fn_main
0x0010
ADD R2, R1, R3
0x0018
CMP R2, #0xFF
0x0020
JNE 0x001C
0x0028
PUSH R4
0x0030
LOAD R5, [R6]
0x0038
RET
Data / Static— Globals, constants, library links
0x0040
0x48656C6C6F
0x0048
0x00000064
0x0050
0xDEADBEEF
0x0058
<lib_printf>
Heap— Dynamic allocation (grows ↓)
0x0060
Vec{ptr,len,cap}
0x0068
[0x01, 0x02, ...]
0x0070
Box<Node>{...}
0x0078
String{ptr,len}
0x0080
(free)
↓
free space
↑
Stack— Function frames (grows ↑)
0x00A0
(free)
0x00A8
frame: fn_bar
0x00B0
local x = 7
0x00B8
ret_addr
0x00C0
frame: fn_main
0x00C8
local y = 42
0x00D0
argc, argv
High Addresses (0xFFFF...)
Code (Text)
Data / Static
Heap
Stack
Therefore, the main point of this diagram is spatial organization: a process sees one ordered address space even though the hardware underneath may be much messier.
Notice the structure: code and static data live in one region, the heap and stack are separate growth areas, and the exact layout is managed by the loader, compiler, ABI, and operating system. The diagram is a first-pass model, but it is a useful one because it highlights that memory management is mostly about controlling regions and translations, not staring at raw voltages.
Memory Across Hardware
There is one final visualization that ties everything together. We said memory is a hierarchy of different hardware, each with different speeds and capacities. Each of those hardware tiers can be thought of as its own "roll of toilet paper" — a finite tape. Registers are a tiny roll. Cache is small. RAM is medium. SSDs are large. HDDs are enormous.
A running process does not live in just one place. Its actively referenced pages may be in RAM, copies of recently used data may also appear in caches, its executable image lives on persistent storage, and its outputs may be written elsewhere. The exact movement policies differ by layer, but one program can leave footprints across several tiers at once.
Memory Across Hardware — Processes Live Everywhere
Each tape represents a different storage medium, scaled by relative capacity. Hover over a process to see its footprint across all tapes — a process is never confined to just one piece of hardware.
Each tape is a finite "roll of toilet paper." Processes dynamically color parts of every roll — their data, code, and state live scattered across the entire memory hierarchy simultaneously.
Therefore, when we say an OS "manages memory," we usually mean it manages the process's active working state across address spaces and main memory while coordinating with slower storage beneath it.
This is the complete picture: each component of your system that contains memory has one of these finite tapes, and the transfer of data, subdivisions, and processes between these tapes is what we as OS developers are here to manage. In the next several lectures, we will study exactly how the OS orchestrates this — virtual memory, paging, page replacement algorithms, and more.
Summary
Memory is not one thing — it is a hierarchy of technologies trading off speed, capacity, cost, and volatility
Volatile memory (SRAM, DRAM) is fast but requires constant power
Non-volatile memory (ROM, Flash, Magnetic, Optical) persists without power but is slower
Semi-volatile memory bridges the gap with limited persistence after power loss
The memory hierarchy spans 10 orders of magnitude in latency — from ~0.3 ns (registers) to ~1 second (tape)
Speed and cost have an almost perfect inverse relationship
The Universal Memory (instant, infinite, non-volatile, free) is the engineering endgame, not a logical impossibility
Your optimization ceiling is your abstraction layer's floor — you can only optimize what your layer exposes
From our OS perspective, memory is a 64-bit-wide vertical tape partitioned into code, data, heap, and stack
Processes live across all memory tiers simultaneously — managing their footprint across hardware is the OS's job
📝 Lecture Notes
Key Takeaways:
Understand volatility before anything else — it determines the entire architecture of modern systems
SRAM (6 transistors/bit) powers registers and cache; DRAM (1 transistor + 1 capacitor/bit) powers main memory — the speed/density/cost tradeoff is fundamental
Non-volatile storage exploits different physics: magnetic polarity, laser-burned pits, trapped electrons, hardwired circuits
The memory hierarchy is an economic and physical inevitability, not a design choice
Sleep modes map directly to which volatile layers remain powered
Your abstraction layer determines what optimizations are even visible to you
Memory engineering advances yearly — the architectures we study are temporary solutions to the Universal Memory problem