In this lecture we spend some time investigating how processes interact with limited resources! Namely, how they interact with them incorrectly to form a new universe of problems referred to as Deadlocking!
In LN11, we armed ourselves with an arsenal of scheduling algorithms — from FCFS to CFS — to decide when processes run. Scheduling determines the order of execution. But today we face a darker question: what happens when processes fight over shared resources?
Imagine a 4-way intersection with no traffic lights. Four cars arrive simultaneously, each occupying one lane. Each car needs the perpendicular lane to clear before they can proceed. But every lane is blocked by another car waiting for the same thing. Nobody moves. This is a deadlock — and it can happen in your operating system, your database, and your concurrent Rust code.
Traffic lights are the "scheduler" that prevents this by imposing an ordering on who goes first. Today we formalize this problem and develop tools to detect, avoid, and prevent it.
Today's Agenda
Sequential vs Concurrent Headaches — New failure modes from concurrency
Resources — What they are, how they're shared, and how they cause problems
Coffman's Four Conditions — The formal foundation of deadlocking
Resource Allocation Graphs — Using graph theory to detect deadlocks
The Dining Philosophers — The canonical deadlock illustration
Deadlock Handling Techniques — Detection, avoidance, prevention, and the ostrich
Rust Deadlock Examples — From broken to fixed
Safety-Critical Stories — When deadlocks kill
Extensions — Distributed deadlocks, formal verification, and beyond
Throughout this lecture, we color-code concepts by category:
Color Legend:ResourceConditionDangerSafeTechnique
Sequential vs Concurrent Headaches
As sequential programmers, you're already familiar with some infuriating bugs:
Sequential Issue
What Goes Wrong
Infinite loops
Control flow never terminates
Stack overflows
Recursive calls exhaust the stack
Blocking I/O
Program halts waiting for a slow device
Circular function calls
A calls B calls A — infinite recursion
Incorrect FSMs
State machine reaches an unrecoverable state
These all share one thing: a single thread gets stuck. But in a concurrent or parallel context, we gain access to entirely new classes of horror:
Concurrent Issue
What Goes Wrong
Deadlock
Multiple processes frozen, each waiting for the other
Livelock
Processes actively running but making zero progress
Starvation
Some processes progress, but one is perpetually ignored
Resource locking
Contention for shared resources degrades performance
Lock-order inversion
Acquiring locks in inconsistent orders creates cycles
Communication locking
Message-passing channels become blocked in both directions
Distributed deadlocks
Deadlocks spanning multiple machines with no global view
Unlike sequential bugs where nothing moves, concurrent systems can exhibit lock-based inefficiencies that manifest only at runtime — starvation and exhaustion let some work progress, but not all. These bugs are notoriously hard to reproduce because they depend on specific timing and scheduling decisions.
The Hallway Problem: Livelock
Ethernet's CSMA/CD protocol was specifically designed to solve livelock: when two devices transmit simultaneously and collide, each waits a random backoff time before retrying. The randomness breaks the symmetry. Without it, both devices would retry at the same instant forever — a livelock.
Deadlock vs Starvation vs Livelock
Failure Mode
Processes Moving?
All Blocked?
Cause
Example
Deadlock
No
Yes (in cycle)
Circular resource dependency
Two threads each hold one mutex, wait for the other
Livelock
Yes (busy)
No
Reactive behavior without coordination
Two people dodging the same direction in a hallway
Starvation
Partially
No
Unfair scheduling policy
Low-priority process never gets CPU time
📚 Historical Note: Edsger Dijkstra first formalized deadlocks in 1965 alongside his invention of semaphores. In 1971, he introduced the Dining Philosophers problem as the canonical illustration. The term "deadly embrace" was used in early IBM documentation before "deadlock" became standard. Dijkstra's contributions to both the problem (deadlock formalism) and the solution (semaphores) make this lecture a direct continuation of his work.
What Are Resources?
Resources span hardware and software:
Category
Examples
Human Analogy
Hardware
CD drive, HDD, CPU core, GPU, printer
Restroom stall, car seat, parking spot
Software
Database record, variable, file, directory, network port
License plate, phone number, SSN
Preemptable vs Non-Preemptable
Because resource allocation grants exclusive access, and processes can be preemptive or cooperative, resources inherit the same categories:
Type
Description
Examples
Preemptable
Can be reassigned mid-operation without harm
Memory (pages can be swapped), CPU time
Non-preemptable
Cannot be interrupted mid-operation — will cause failure
Burning a CD, printing a page, database transaction
In human terms: doing dishes is preemptable (you can stop and start freely). Using the restroom is non-preemptable. Please don't think too hard about a preemptable porta-potty.
Resource Lifecycle
Every resource follows the same lifecycle:
Request / Acquire — The process asks for and receives the resource
In Use — The process holds and operates on the resource
Release — Voluntary or involuntary relinquishment
This cycle repeats. The problem arises in step 1: waiting. If a process can never acquire what it needs, computation stalls.
Resource Instances
Some resources have multiple identical instances — a system with 4 printers, 8 CPU cores, or 16 database connection slots. Deadlock analysis must account for this: a resource type with k instances requires k+1 simultaneous requests to guarantee that at least one process is blocked.
Rust Connection: Mutex<T> as a Resource
Rust's Mutex<T> is the textbook non-preemptable resource. Combined with Arc for shared ownership across threads, it maps directly to the resource lifecycle:
use std::sync::{Arc, Mutex};
use std::thread;
letresource = Arc::new(Mutex::new(0)); // Create resourceletr = Arc::clone(&resource);
thread::spawn(move || {
letmut data = r.lock().unwrap(); // 1. Request & Acquire
*data += 1; // 2. In Use// MutexGuard drops here // 3. Release (RAII)
});
The MutexGuard returned by .lock() implements Rust's RAII pattern — the lock is automatically released when the guard goes out of scope. This makes "forgetting to release" structurally impossible, unlike C/C++ where manual unlock() calls can be missed.
Semaphores: From Dijkstra to Rust
Dijkstra invented both the deadlock formalism AND semaphores as a coordination solution. A binary semaphore (0 or 1) is similar to a mutex for teaching purposes, though mutexes usually add ownership rules that plain semaphores do not. A counting semaphore (initialized to k) naturally handles multi-instance resources — it allows up to k concurrent holders.
The classic semaphore operations are P (proberen — "to test", decrements) and V (verhogen — "to increment"). These names come from Dutch, Dijkstra's native language. They're error-prone: nothing prevents you from calling V without P, or P twice without V.
Rust's Mutex<T> with RAII MutexGuard is the modern evolution: the type system enforces correct acquire/release patterns at compile time. Language design has evolved to make deadlock-prone patterns harder to write.
Lock Granularity Spectrum
How much should a single lock protect? This creates a fundamental engineering tradeoff:
Granularity
Strategy
Deadlock Risk
Parallelism
Example
Coarse
One lock for everything
None (no cycles possible)
None (fully serial)
Python's GIL
Fine
One lock per resource
High (many locks = many cycle opportunities)
High
Linux kernel per-inode locks
Lock-free
No locks at all
None (no locks to deadlock on)
Maximum
std::sync::atomic in Rust
The scheduling problem gave us the quantum tradeoff (too large = convoy, too small = overhead). The deadlock problem gives us the granularity tradeoff (too coarse = no parallelism, too fine = deadlock risk).
Python's GIL — The Coarse Extreme in Practice
Python's Global Interpreter Lock is a real-world case study of the coarsest possible lock granularity. CPython uses a single mutex that protects all interpreter state. Any thread executing Python bytecode must hold the GIL, meaning only one thread runs Python code at a time — even on a 128-core machine.
This is deadlock prevention by construction: with one lock, circular wait is structurally impossible (you can't have a cycle with one node). But the cost is that Python threads can never achieve true parallelism on CPU-bound work — only concurrency via interleaving. This is exactly the distinction from LN7!
CPython's developers chose the simplest possible solution to protect the interpreter's reference-counting garbage collector from race conditions. It prevented deadlocks on internal structures but became the single most notorious limitation of the language for parallel workloads. This is why:
multiprocessing exists (separate processes with separate GILs)
NumPy releases the GIL during C-level computation
PEP 703 ("nogil") is an ongoing effort to remove the GIL entirely
Students who have written Python since day one now have the formal vocabulary to understand exactly whythreading doesn't speed up CPU-bound code and why Rust doesn't have this problem — ownership + borrow checker replaces the GIL's role at compile time, with zero runtime cost.
📚 Language Theory Connection: The GIL is a language runtime design decision, not a language specification decision. Other Python implementations (Jython on the JVM, PyPy's STM experiments) prove that the GIL is not inherent to Python's semantics — it's an implementation tradeoff that traded parallelism for simplicity in CPython's memory management.
Coffman's Four Conditions
So is this problem like the halting problem? Are we doomed to never find a general solution? Must we always rest uneasy, knowing our code might deadlock?
Actually, no! Unlike the halting problem, deadlocks are solvable in general for a known resource model. Coffman's research (1971) led to an amazing discovery: deadlock occurs if and only if four conditions are met simultaneously. Break even one, and the system makes progress.
The key insight: Remove ANY ONE condition and deadlock is impossible. This is what makes deadlocks fundamentally different from the halting problem — we have a complete, decidable characterization of when they occur. All we need is linear algebra or graph theory!
"Not-Quite-Deadlock" Examples
To reinforce that all 4 are necessary, consider what happens with only 3:
Missing Condition
What Happens
Why No Deadlock
No Mutual Exclusion
Resources are shared freely
No contention — everyone reads simultaneously
No Hold and Wait
Processes request all resources atomically
Either get everything or nothing — no partial holding
No No Preemption
The system can forcibly reclaim the contended resource
Cycles can be broken because the waiting chain no longer has to remain intact
No Circular Wait
Resources are ordered; always acquired in sequence
Linear chains can't form cycles
Before you toggle anything, ask yourself which condition is usually easiest to break in a real design: ordering, atomic acquisition, or reclaimability.
Toggle the conditions below to see this in action:
Coffman's Four Conditions
Toggle conditions to see when deadlock is possible. All four must hold simultaneously.
DEADLOCK
All 4 conditions are met — deadlock is guaranteed. Toggle any condition off to break it.
MEANDHWANDNPANDCW=DEADLOCK
Therefore, the practical design game is often: pick the cheapest Coffman condition to break and engineer around that one.
Resource Allocation Graphs
We can formalize deadlock detection using graph theory — specifically, directed graphs. If you've taken discrete mathematics, the cycle-detection algorithms here are closely related to the spanning tree algorithms you already know.
⚠️ Multi-instance caveat: With multiple resource instances (e.g., 3 printers), a cycle is necessary but not sufficient for deadlock. A knot — a strongly connected component with no outgoing edges — is required. For single-instance resources, cycle = deadlock.
Step-by-Step: From Safe to Deadlock
Watch how a deadlock forms step by step. As you click, narrate each edge as either holds or requests; that makes the cycle much easier to see.
Resource Allocation Graph
Step through resource requests and assignments. Dashed arrows = request, solid arrows = assignment. Red = cycle detected (deadlock).
Step 1 / 4
Two threads and two mutexes exist. No requests yet.
let mutex_a = Arc::new(Mutex::new(0));
let mutex_b = Arc::new(Mutex::new(0));
Process
Resource
Request
Assignment
In cycle
Notice the third scenario: the same processes, same resources, same work — but a different acquisition order prevents the cycle entirely. This is exactly the insight behind resource ordering as a prevention technique.
Therefore, the graph is useful because it shows the exact structural change: ordering removes the possibility of a circular wait.
Build Your Own Deadlock
Now try it yourself. First create a deadlock on purpose, then fix it by breaking one Coffman condition:
Build Your Own Deadlock
Add processes and resources, draw edges (request/assignment), and watch for cycles. Directed edge from Process to Resource = request. From Resource to Process = assignment.
Add some processes and resources to get started!
Process
Resource
|Process → Resource = RequestResource → Process = Assignment
Therefore, the sandbox is not just a detector; it is a design lab for testing which prevention move actually breaks the cycle.
The Dining Philosophers Problem
Dijkstra's 1971 formulation is the canonical deadlock illustration:
Five philosophers sit at a round table. Between each pair is a single fork (5 forks total). To eat, a philosopher needs both adjacent forks. They alternate between thinking and eating.
This maps perfectly to a resource allocation graph: philosophers are processes, forks are resources. If all five simultaneously pick up their left fork — circular wait — deadlock!
The solutions preview the four handling techniques:
Prevention (ordering): Number the forks; always pick up the lower-numbered fork first. This breaks circular wait.
Prevention (atomic): Only pick up both forks simultaneously, or neither. This breaks hold-and-wait.
Detection: A monitor detects the cycle and forces one philosopher to put down their fork.
Before running the demo, compare two styles of solution: changing timing and changing guarantees. Which one actually removes the possibility of deadlock?
Dining Philosophers Problem
5 philosophers, 5 forks. Each needs 2 adjacent forks to eat. Choose a strategy and step through the scenario.
Step 1 / 4
All philosophers are thinking. All forks are on the table.
P0thinking
P1thinking
P2thinking
P3thinking
P4thinking
Thinking
Hungry
Eating
Waiting
Fork (free)
Fork (held)
Therefore, the strongest solutions are usually the ones that change the resource-allocation rules, not just the ones that hope the timing works out.
Deadlock Handling Techniques
Four major approaches, organized by a medical analogy:
Detection and Recovery — "The Emergency Room"
Detection: Run cycle detection on the RAG periodically or on every block event. This is standard DFS-based graph traversal — O(V+E) per check.
Recovery options (in order of increasing violence):
Preempt a resource — Roll back the process, checkpoint/restart. Least destructive but requires checkpointing support.
Kill a victim process — Select the "cheapest" process to terminate (lowest priority? newest? least work done?).
Kill ALL deadlocked processes — The nuclear option. Guaranteed to work.
Cost of detection: Running cycle detection on every resource request is expensive. Amortized approaches run detection only when a process has been blocked for suspiciously long — trading detection latency for efficiency.
Timeout-Based Detection
A pragmatic middle ground between formal graph analysis and ignoring the problem: if you cannot acquire a resource quickly enough, back off, release what you can, and retry. This is how distributed systems (gRPC, HTTP clients, database connections) often handle deadlock risk in practice.
In Rust, Mutex::try_lock() is the non-blocking building block for that style:
use std::sync::Mutex;
letlock = Mutex::new(42);
match lock.try_lock() {
Ok(guard) => println!("Got the lock: {}", *guard),
Err(_) => println!("Couldn't acquire — backing off!"),
}
Real-world database detection: PostgreSQL and MySQL InnoDB use wait-for graphs (a simplified RAG containing only processes, not resources) and abort a victim transaction when a cycle is detected. You can actually trigger this and see the ERROR: deadlock detected message yourself.
Deadlock Avoidance — "The Doctor's Diet"
Two key definitions:
State
Meaning
Safe state
There exists some ordering of processes that allows ALL to complete
Unsafe state
No guaranteed ordering exists — deadlock is possible (but not certain)
The Banker's Algorithm
The most famous avoidance algorithm, invented by Dijkstra (1965). Named because a banker must ensure they never loan out so much money that they can't satisfy all customers. The OS is the banker; resources are money; processes are customers with credit limits.
Let's see this with actual money first. You're the bank manager. Each customer has deposited some cash and has a credit limit — the maximum they could ever request. The bank vault holds whatever cash hasn't been loaned out. This analogy is only a warm-up, but it makes the safe/unsafe distinction intuitive. Try adjusting the deposits, credit limits, and vault cash to see when the bank is safe vs. at risk:
🏦Dijkstra's Bank — The Original Analogy
You are the bank manager. Each customer has deposited some money and has a credit limit (the max they'll ever need). Your job: never loan out so much cash that you can't guarantee every customer gets their full credit limit eventually.
Bank Vault (Cash on Hand)
Total deposited by customers: $4,500 · Total credit limits: $12,000
$
Alice
Deposited:
$
Credit Limit:
$
Still needs:$3,000
$0$4,000
Bob
Deposited:
$
Credit Limit:
$
Still needs:$3,000
$0$5,000
Charlie
Deposited:
$
Credit Limit:
$
Still needs:$1,500
$0$3,000
SAFE — Loans can be honored!
The bank can serve all customers by processing them in this order. Each customer finishes their business and returns their deposit, replenishing the vault for the next:
Charlie→Alice→Bob
Already deposited
Still needs (loan remaining)
Notice the key insight: the bank doesn't need enough cash to cover everyone at once — it just needs enough to let at least one customer finish, get their deposit back, then use that to serve the next customer, and so on. A safe state is one where such an ordering exists. An unsafe state means no such ordering is guaranteed.
Therefore, the analogy is really about existence of a safe sequence, not about money itself.
Generalizing beyond money. In real operating systems, there isn't just one resource — there are multiple: memory pages, disk sectors, network sockets, printer queues, and more. The algorithm generalizes by tracking a vector of resources instead of a single dollar amount. Each process has an allocation vector (what it holds), a max vector (what it could ever need), and available becomes a vector of all free resources. The same "can at least one process finish?" check runs across all resource types simultaneously — a process can only proceed if its need is met in every dimension.
The algorithm checks, at each request: "If I grant this, can I still guarantee a safe sequence where every process eventually gets what it needs?" If not, the request is denied.
Try the full multi-resource version yourself — but remember one crucial distinction before you start: unsafe does not mean deadlocked right now. It means the system has lost its guarantee of being able to finish everyone safely.
Banker's Algorithm — Safe State Calculator
Edit allocation, max need, and available resources. The algorithm determines if a safe execution sequence exists.
Available Resources
A
B
C
Process
Allocation
Max Need
Need (Max - Alloc)
A
B
C
A
B
C
A
B
C
P0
7
4
3
P1
1
2
2
P2
6
0
0
P3
0
1
1
P4
4
3
1
SAFE STATE
A safe execution sequence exists. The banker can satisfy all customers.
Safe sequence:P1→P3→P4→P0→P2
Algorithm trace:
1.P1need=[1,2,2]≤work=[3,3,2]✓→ work=[5,3,2]
2.P3need=[0,1,1]≤work=[5,3,2]✓→ work=[7,4,3]
3.P4need=[4,3,1]≤work=[7,4,3]✓→ work=[7,4,5]
4.P0need=[7,4,3]≤work=[7,4,5]✓→ work=[7,5,5]
5.P2need=[6,0,0]≤work=[7,5,5]✓→ work=[10,5,7]
Therefore, Banker's Algorithm is a refusal rule: it denies requests that would move the system from "safe but tight" into "unsafe and no longer guaranteed."
Deadlock Prevention — "An Ounce of Prevention"
Condition to Attack
Prevention Strategy
Tradeoff
Mutual Exclusion
Use spooling/virtualization (e.g., print spooler)
Not always possible — some resources are inherently exclusive
Hold and Wait
Require all resources upfront (atomic acquisition)
Low utilization; process may not know all needs in advance
No Preemption
Allow resource preemption (checkpoint and rollback)
Expensive; not always safe for non-preemptable resources
Circular Wait
Impose a total ordering on resources
Restricts flexibility; must be enforced globally
Resource Ordering in Practice
Linux kernel's lockdep tool automatically detects lock ordering violations at runtime. It builds a lock dependency graph and warns if a cycle is possible, even before a deadlock actually occurs. This is prevention via tooling — the system learns the correct ordering from observation and flags deviations.
Rust's Compile-Time Prevention
Rust's ownership and borrowing rules prevent entire classes of data races at compile time. MutexGuard with RAII ensures locks are always released. While Rust can't prevent all deadlocks statically (that would require solving the halting problem for lock orderings), it eliminates the most common causes — forgotten unlocks, use-after-free on locked resources, and data races.
Lock-Free and Wait-Free Data Structures
The ultimate form of prevention: eliminate locks entirely using atomic compare-and-swap (CAS) operations. Rust's std::sync::atomic module provides these primitives. CAS is a CPU instruction (connecting back to LN9's hardware lecture), and lock-free data structures attack mutual exclusion at the algorithmic level.
⚠️ The ABA Problem: Lock-free algorithms have their own subtle pitfall. A value changes from A to B and back to A. A CAS operation sees "still A" and incorrectly assumes nothing changed. Solutions include tagged pointers and hazard pointers — a whole subfield of concurrent data structures.
The Ostrich Algorithm — "Head in the Sand"
This sounds absurd — but this is what Unix and Windows actually do for most user-space resources. The reasoning is pure cost-benefit analysis:
If the expected cost of rare deadlocks is less than the ongoing cost of preventing them, ignore them.
Most interactive systems accept some deadlock risk because the expected incidence is low enough that full prevention is not worth the cost. When deadlocks do occur, restarting an application or rebooting the machine may be cheaper than paying continuous prevention overhead everywhere.
📚 Engineering Risk Management: This is the same principle used throughout engineering. Bridges are designed for 99.9% of expected loads, not 100%. Software reliability targets "five 9s" (99.999% uptime) — not 100%, because the cost of that last 0.001% is astronomical. The ostrich algorithm is mathematically justified risk management.
So all this formal theory, all these algorithms — and the practical answer for most interactive systems is "don't worry about it, and use a preemptive scheduler." The formal tools become critical in safety-critical and real-time systems, where the cost of a deadlock is not a reboot but a catastrophe.
Rust Deadlock Examples
Let's see deadlock in Rust and fix it with the techniques we've learned.
The Deadlock
Two threads acquiring two mutexes in opposite order:
This program will hang forever — both threads are stuck waiting for the other's lock. All four Coffman conditions are met.
Fix 1: Resource Ordering (Prevention)
Always acquire locks in the same order. This breaks circular wait:
// Both threads: always acquire A before Blett1 = thread::spawn(move || {
let_lock_a = a1.lock().unwrap(); // A firstlet_lock_b = b1.lock().unwrap(); // B second
});
lett2 = thread::spawn(move || {
let_lock_a = a2.lock().unwrap(); // A first (same order!)let_lock_b = b2.lock().unwrap(); // B second
});
Fix 2: try_lock with Backoff (Detection/Recovery)
Use non-blocking lock attempts and retry on failure:
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;
lett1 = thread::spawn(move || {
loop {
letlock_a = a1.lock().unwrap();
match b1.try_lock() {
Ok(lock_b) => {
// Got both! Do work.break;
}
Err(_) => {
drop(lock_a); // Release A and retry
thread::sleep(Duration::from_millis(1));
}
}
}
});
This breaks hold and wait — if we can't get everything, we release what we have.
Use channels instead of shared memory — attacking mutual exclusion by eliminating the shared resource entirely:
use std::sync::mpsc;
use std::thread;
let (tx, rx) = mpsc::channel();
// Worker thread owns the data exclusivelyletworker = thread::spawn(move || {
letmut value = 0;
formsgin rx {
match msg {
"increment" => value += 1,
"get" => println!("Value: {}", value),
_ => break,
}
}
});
// Other threads send messages instead of locking
tx.send("increment").unwrap();
tx.send("get").unwrap();
tx.send("quit").unwrap();
worker.join().unwrap();
No mutexes, no locks, no deadlock. The data is owned by one thread, and others communicate via messages. This is the actor model — the same philosophy behind Erlang, Go's goroutines, and Rust's Tokio channels.
Scheduling and Deadlock
Tying back to LN11: scheduling affects timing, which can influence whether a buggy locking pattern deadlocks on a particular run. But timing luck is not a guarantee. The "Avoided via Ordering" scenario in the RAG stepper works because the locking rule changed, not because the scheduler happened to be clever.
Preemptive scheduling is still valuable for responsiveness and fairness, but reclaiming CPU time is not the same thing as reclaiming a mutex, file lock, or printer. Deadlock prevention has to target the contended resource itself.
Safety-Critical Stories
These techniques aren't academic exercises. When concurrent systems fail in safety-critical contexts, people get hurt.
The Therac-25 Incident (1985–1987)
The Therac-25 was a radiation therapy machine controlled by software. Race conditions between the operator interface and beam-control hardware — a resource contention problem without proper synchronization — caused the machine to deliver lethal radiation doses. Three patients died and three were seriously injured.
The root cause was exactly what we've been studying: shared resources (beam configuration state) accessed by multiple concurrent processes (operator interface and beam control) without proper mutual exclusion or synchronization. The developers had removed hardware safety interlocks that existed in previous models, trusting the software alone to manage the resources correctly. The software had a race condition that could set the beam to full power while the system believed it was in low-power mode.
If Coffman's conditions are "how deadlocks happen," Therac-25 is "why we must care." Concurrent programming bugs don't just hang your program — in safety-critical systems, they can kill.
The Mars Pathfinder (1997)
NASA's Mars Pathfinder rover kept randomly resetting on the Martian surface. The cause: priority inversion — a deadlock-adjacent problem.
A high-priority task (the bus manager) was blocked waiting for a mutex held by a low-priority task (the meteorological data collector). A medium-priority task (communications) preempted the low-priority task, preventing it from releasing the mutex. The high-priority task starved, the watchdog timer fired, and the system rebooted.
The fix: priority inheritance protocol — temporarily boost the low-priority holder to the high-priority waiter's level so it can finish and release the resource. This was patched remotely from Earth via a C parameter change transmitted to Mars.
📌 Connection: Priority inversion combines elements of both deadlock (resource contention) and starvation (a high-priority process can't proceed). It's a reminder that the problems in this lecture interact with the scheduling problems from LN11 in subtle ways.
Extensions
Distributed Deadlocks
In distributed systems, no single node has the complete resource allocation graph. Detection requires either:
A centralized coordinator that collects lock information from all nodes (single point of failure)
Distributed algorithms like Chandy-Misra-Haas (1983) that propagate "probe" messages through the wait-for graph
This is why microservice architectures overwhelmingly prefer timeouts over formal deadlock detection — it's impractical to maintain a global RAG across hundreds of services.
Convoy Effect Revisited
The convoy effect from LN11 is actually a form of resource-based starvation when viewed through this lecture's framework. A long process "deadlocks" the ready queue by holding the CPU resource via hold and wait. The formal vocabulary we built today lets you see FCFS convoy as a special case of resource contention where the resource is CPU time itself.
Two-Phase Locking (2PL)
A database technique where transactions acquire all locks in a "growing" phase, then release all in a "shrinking" phase. This attacks hold and wait — by the time you start releasing, you've already acquired everything you need. 2PL is the foundation of ACID transaction isolation in most relational databases. If you take a database course, you'll see this again.
Formal Verification
Tools like TLA+ (Leslie Lamport — also the creator of LaTeX!) and SPIN can exhaustively verify concurrent programs for deadlock before deployment. They enumerate all possible interleavings of concurrent operations and check for cycles in every reachable state. NASA uses these tools for spacecraft software — where the cost of a deadlock is not a reboot but mission failure.
The graph-theoretic approach in this lecture scales directly to industrial verification. TLA+ models are essentially formal RAGs with state transitions — the same concepts, deployed at industrial scale.
Spinlocks vs Blocking Locks
When a resource is contested, do you spin (busy-wait in a loop checking the lock) or block (context switch to another process)?
Spinlocks are efficient for very short critical sections (no context switch overhead) but waste CPU cycles and can cause livelock-like behavior on single-core systems
Blocking locks are efficient for long critical sections (CPU does useful work while waiting) but pay the context switch cost
This connects back to LN11's scheduling concerns: spinlocks are essentially cooperative ("I'll keep checking"), while blocking locks delegate to the scheduler ("wake me when it's free"). The choice depends on the expected wait time relative to the context switch cost — yet another engineering tradeoff in the granularity spectrum.
Summary
Technique
Medical Analogy
What It Does
When to Use
Detection & Recovery
Emergency Room
Find and break deadlocks after they occur
Databases, systems with checkpointing
Avoidance
Doctor's Diet
Deny requests that could lead to unsafe states
Known resource bounds (Banker's)
Prevention
Ounce of Prevention
Architect away Coffman conditions
Safety-critical systems, kernel design
Ostrich
Ignore It
Assume deadlocks are too rare to worry about
Unix, Windows, most interactive systems
📝 Lecture Notes
Key Definitions:
Term
Definition
Resource
Something a process may need to acquire, wait for, or share under limited availability
Preemptable
Resource that can be reassigned mid-operation without harm