In this lecture we continue our investigation of scheduling by learning many of the techniques used by Batch, Interactive, and even Real time systems! These include straightforward techniques like using a simple queue (FCFS) all the way to using multi-queue round robin techniques that rely on an internal lottery!
In LN10, we framed the scheduling problem around three concerns:
Computation is a service — we must meet user demands
Hardware resources are limited — we must share optimally
Efficiency is paramount — any underutilization is unacceptable
We identified two of the three major players:
The Processes — Compute-bound (long CPU bursts) vs I/O-bound (long I/O waits)
System Expectations — Fairness, policy enforcement, and balance, prioritized differently by Batch, Interactive, and Real-Time systems
Today we introduce the third player: the hardware and the algorithms it enables. Each algorithm reflects different tradeoffs, and the hardware you have determines which ones work well.
Throughout this lecture, we'll color-code terminology by process state and key algorithmic features:
Color Legend:Key FeatureReadyRunningBlocked
Batch System Algorithms
Batch systems optimize for throughput (jobs/hour) and turnaround time (submission to completion) at the cost of response time and fairness. These algorithms work best when left alone — no interactive users watching.
First Come First Serve (FCFS)
The most straightforward scheduling algorithm: a single FIFO Queue.
In its standard form, FCFS is the simple non-preemptive baseline: a job runs until it blocks or finishes.
Before looking at the tape, predict the failure mode: what happens to lots of tiny jobs when one huge one arrives first?
FCFS — Arrival OrderTotal: 19u
P1(3u)
P2(8u)
P3(2u)
P4(5u)
P5
0u
t=0u: P1
0
3
11
13
Simple, fast, and straightforward. But it has one devastating weakness.
The Convoy Effect
Imagine you're at Starbucks. You have a simple coffee order — 30 seconds. But the person in front of you is ordering 15 custom drinks for their office. You wait. And wait. And everyone behind you waits too.
Convoy Effect — BIG job starves small onesTotal: 27u
BIG(20u)
p2
p5
0u
t=0u: BIG
0
21
25
Large, CPU-hungry processes starve smaller processes by forcing them to wait. All those tiny jobs back up behind the large one, inflating their turnaround times massively even though they could complete in seconds.
📚 Head-of-Line Blocking: The convoy effect isn't unique to CPU scheduling. The exact same problem appears in networking (TCP head-of-line blocking, which HTTP/2 was designed to solve), grocery checkout lines, and traffic engineering. Recognizing it as a general pattern — head-of-line blocking — helps you spot it everywhere in systems design.
If only we knew ahead of time who has a huge order, we could let people with tiny orders go first...
Shortest Job First (SJF)
That intuition leads directly to SJF: sort by runtime, then execute in order.
Let's compare the same jobs in three orderings. Each tape shows execution order — the total CPU work is identical (19u) in all three, but the waiting and turnaround times differ dramatically:
Random FCFS OrderTotal: 19u
P2(8u)
P4(5u)
P1(3u)
P5
P3(2u)
0
8
13
17
SJF — Shortest First (Best Turnaround)Total: 19u
P5
P3(2u)
P1(3u)
P4(5u)
P2(8u)
1
3
6
11
Reverse — Longest First (Worst Turnaround)Total: 19u
P2(8u)
P4(5u)
P1(3u)
P3(2u)
P5
0
8
13
16
Computing Average Turnaround Time
Ordering
Turnaround Times
Sum
Average
Random FCFS
8 + 13 + 16 + 17 + 19
73
14.6u
SJF (Shortest First)
1 + 3 + 6 + 11 + 19
40
8.0u ✓
Reverse (Longest First)
8 + 13 + 16 + 18 + 19
74
14.8u
SJF achieves 8.0u average turnaround — nearly half of the random ordering! The total runtime is always 19u, but the short jobs complete much earlier, dramatically improving the average.
The Proof (Exchange Argument)
SJF's optimality can be shown with a simple exchange argument: if any two adjacent jobs are out of order (longer before shorter), swapping them improves the average turnaround without affecting total time. Repeating this swap for every out-of-order pair produces the shortest-first ordering. This is provably optimal.
"Optimal But Impossible"
Here's the catch: SJF requires knowing runtimes in advance. This is only feasible in very controlled environments — batch systems or embedded systems whose programs are designed from the ground up to report their runtime to the OS.
📚 A Recurring OS Theme: SJF requires knowing the future, just like Belady's optimal page replacement algorithm (which we'll see in memory management). Both are theoretically optimal but practically impossible, motivating the heuristic approaches (SRTN, LRU) that follow. This pattern recurs throughout OS design: the optimal solution requires omniscience, so we approximate.
Shortest Remaining Time Next (SRTN)
What if we can't get exact runtimes ahead of time, but we can get them when a process shows up? Maybe the compiler could estimate runtime for each binary. Then we could maintain a dynamically updating Priority Queue sorted by remaining time.
Consider this situation: we're running a long process when a tiny one suddenly arrives:
FCFS — email waits 30u!Total: 31u
P1(5u)
P2(10u)
P3(15u)
0
5
15
SRTN — email cuts in line!Total: 31u
P1(5u)
P2(10u)
P3'(7u)
P3''(8u)
0
5
15
23
With FCFS, the email refresh waits behind everything. With SRTN, when the 1u email process arrives mid-P3, it preempts P3, splitting it into P3' and P3''. The email completes almost instantly while P3 barely notices the interruption.
Exponential Averaging for Runtime Estimation
In practice, runtime estimation uses exponential averaging — a weighted moving average that tracks changing burst patterns:
Where $\tau_{n+1}$ is the predicted next burst, $t_n$ is the actual last burst, $\tau_n$ is the previous prediction, and $\alpha$ controls how much weight we give to recent observations (typically $\alpha = 0.5$).
Worked example with $\alpha = 0.5$ and initial prediction $\tau_0 = 10$:
Burst #
Actual ($t_n$)
Previous Prediction ($\tau_n$)
New Prediction ($\tau_{n+1}$)
Computation
1
10
10
10.0
0.5 * 10 + 0.5 * 10
2
8
10.0
9.0
0.5 * 8 + 0.5 * 10
3
6
9.0
7.5
0.5 * 6 + 0.5 * 9
4
4
7.5
5.75
0.5 * 4 + 0.5 * 7.5
The prediction converges toward the downward trend — the system learns that this process's bursts are getting shorter. This is essentially a simple predictive ML model decades before the term existed!
Interactive System Algorithms
Interactive systems optimize for response time and fairness at the cost of raw throughput. Our devices need to feel fast — not necessarily complete jobs as quickly as possible.
Round Robin
This guarantees that every ready process gets the CPU within a bounded time — excellent for response time. It is simple and fair in the narrow sense that each ready job receives the same slice per cycle.
Before you move the slider, watch two outputs at once: response time and context-switch overhead.
Round Robin — Time Quantum Explorer
Drag the slider to see how the time quantum affects context switch overhead. Context switch cost = 1u.
Quantum: 10u15u
Round Robin (q=10u)Total: 45u(work: 40u, switches: 5u, overhead: 11.1%)
P1(10u)
P2(8u)
P3(10u)
P4(5u)
P1
P3(5u)
0
11
20
31
40
Total Time
45u
Switches
5
Switch Time
5u
Overhead
11.1%
Therefore, Round Robin is only as good as its quantum: too large feels like FCFS, too small burns time on switching.
The Quantum Tradeoff
Assume a context switch costs 1u:
Time Quantum
Overhead
What It Feels Like
100u (large)
~1%
Basically FCFS — convoy effect returns
10u
~9%
Good balance — responsive and efficient
1u
50%
Half the time is spent switching!
0.5u
67%
System spends most of its time doing nothing useful
The irony: a system that is perfectly responsive to all work is one that doesn't compute, because it spends all its time switching! And a system that maximizes computation completely ignores you. The sweet spot is in between, and finding it for specific workloads is an active area of engineering.
📌 Key Insight: FCFS actually minimizes context switches (zero if all jobs run to completion). FCFS occurs when your time quantum is larger than the runtime of all jobs!
The I/O-Bound Paradox
Counterintuitively, Round Robin actually favors compute-bound processes. I/O-bound processes naturally yield before their quantum expires (they become blocked for I/O), so they only use a fraction of their slice. Compute-bound processes consume the entire quantum every time. Over time, compute-bound processes accumulate more total CPU time.
Virtual Round Robin fixes this by giving I/O-bound processes a "bonus" quantum when they return from blocked to ready, compensating for the time they didn't use.
Priority Scheduling
This is excellent for maintaining system invariants. Sometimes we can't compute why something is more important — we just assign it an importance value directly.
Compare FCFS arrival order with priority-sorted order — notice the sorting is by priority, not runtime:
FCFS — Arrival OrderTotal: 21u
P1 pri=3(6u)
P2 pri=1(4u)
P3 pri=5(8u)
P4 pri=2(3u)
0
6
10
18
Priority — Sorted by ImportanceTotal: 21u
P2 pri=1(4u)
P4 pri=2(3u)
P1 pri=3(6u)
P3 pri=5(8u)
0
4
7
13
Notice that SJF and SRTN are special cases of priority scheduling where the priority is the inverse of the runtime!
Static priorities are simple but rigid. Dynamic priorities allow the OS to:
Reduce the priority of processes that have consumed a lot of CPU time (preventing starvation)
Raise the priority of processes that have been waiting too long (aging)
Boost I/O-bound processes that return from blocked (better response time)
Programmers generally shouldn't set their own priority — they'll always be either too nice or too greedy. The OS should handle it.
💻 The nice Command: In Unix, users interact with priority through nice: nice -n 19 ./slow_task gives a process the lowest priority. The name "nice" literally means "be nice to other processes by taking less priority." Try it in your terminal!
Starvation vs Deadlock
Two distinct scheduling failures worth distinguishing clearly:
Failure
What It Is
Cause
Starvation
Process could run but never gets selected
Bad scheduling policy (low priority forever)
Deadlock
Processes can't proceed at all
Circular resource dependency
Starvation is a policy problem (the scheduler is unfair). Deadlock is a resource contention problem (processes hold resources each other needs). We'll dive deep into deadlock in LN12.
Multi-Level Feedback Queue (MLFQ)
This is arguably the most important practical scheduling algorithm — the basis for BSD Unix, Solaris, and early Windows NT schedulers.
The MLFQ Rules
New jobs enter the highest-priorityqueue
If a job uses its entire quantum, it demotes to the next lower queue
If a job yields before the quantum expires (I/O), it stays in its current queue
Periodically, boost ALL processes back to the top queue (prevents starvation)
Before studying the diagram, mentally trace one CPU-bound job and one I/O-bound job through the queues.
Multi-Level Feedback Queue (MLFQ)
New processes enter at the top. Processes that use their full quantum demote downward. I/O-bound processes stay high. Periodic boost pushes everyone back up.
New Process
Q0 — Round Robin (q=2u)Highest Priority
P1(new)
P4(I/O return)
...
Demote (used full quantum)
Q1 — Round Robin (q=4u)Medium Priority
P2(used full q=2u)
...
Demote (used full quantum)
Q2 — FCFSLowest Priority
P3(used full q=4u)
P5(compute-bound)
...
Periodic Boost
Enter / Boost up
Demote down
I/O-bound processes stay high (yield before quantum)Compute-bound processes sink low (use full quantum)
Therefore, MLFQ works by learning from behavior: jobs that keep using whole quanta drift downward, while jobs that block early tend to stay more responsive.
The Beauty of MLFQ
MLFQ automatically separates compute-bound and I/O-bound processes without needing to know anything in advance:
I/O-bound processes yield quickly (they go blocked for I/O) and stay in high-priority queues — great response time
Compute-bound processes use full quanta and sink to lower queues — they get throughput but don't block interactive work
The algorithm learns process type through behavior, not configuration
This is why MLFQ was so revolutionary: it combines the responsiveness of Round Robin for interactive tasks with the throughput of FCFS for batch tasks, all in one scheduler.
The queues can also map to system concerns:
A new-process queue that always pops first (fast startup)
Blocked processes returning to different queues based on why they blocked (device request vs RAM call vs cache miss)
Queues for different security levels or scheduling tiers (premium vs free)
Shortest Process Next (SPN)
We wish we had SJF, but we do not know runtimes. SPN approximates shortest-first behavior by tracking burst history and estimating future CPU bursts using the same exponential-averaging idea from SRTN:
Example: Estimating runtimes for 3 processes
Process
Burst History
Estimated Next Burst
P1
[10, 8, 6, 4]
5.75u (trending down)
P2
[2, 3, 2, 3]
2.5u (stable)
P3
[20, 18, 22]
20.5u (stable, long)
With these estimates, SPN sorts the ready queue by estimated burst:
SPN — Sorted by Estimated RuntimeTotal: 29u
P2 (~2.5u)(3u)
P1 (~5.8u)(4u)
P3 (~20.5u)(22u)
0
3
7
SPN is best understood as a heuristic approximation. It is practical because it relies on observed burst behavior instead of omniscience, but it inherits all the usual caveats of prediction error.
Guaranteed / Promise Scheduling
A promise might be: "every user gets equal CPU time," "if you pay $X, you get Y% of the CPU," "user privilege level maps to queue priority," or "admin processes always run before intern processes."
The key insight is that guaranteed scheduling composes other algorithms to maintain promises. For example:
Promise: "Kernel operations always run first. Users share remaining CPU fairly by security level."
This can be implemented as a modified MLFQ:
Queue
Policy
Internal Algorithm
Q0 (highest)
Kernel ops only
FCFS — always runs first
Q1
Admin user processes
Round Robin q=4u
Q2 (lowest)
Normal user processes
Round Robin q=8u
This is like parameterizing a constructor: the policy defines the contract ("be fair by role"), and the mechanism (MLFQ with specific queue rules) is chosen to fulfill it. In Linux, this is implemented via the process hierarchy — root's process tree gets scheduling priority since it's the hierarchy root, and cgroups allow even finer-grained resource allocation at the user and container level.
Lottery Scheduling
Most scheduling algorithms make it difficult to analyze what percentage of CPU time a process actually receives. Lottery scheduling makes it trivially analyzable: your ticket share IS your CPU share.
Try it yourself — assign tickets and watch the schedule build. Run it a few times with the same ticket counts; single trials are noisy, but the long-run proportions are the real lesson.
Lottery Scheduler — Interactive
Assign tickets to each process, then hit Draw to see who wins CPU time. More tickets = better odds.
P1
50.0% of tickets
P2
30.0% of tickets
P3
15.0% of tickets
P4
5.0% of tickets
Therefore, lottery scheduling is best read statistically: one run may look unfair, but repeated draws converge toward proportional share.
Ticket transfer: Cooperating processes can share tickets. A parent might send tickets to a valuable child process, or vice versa. This allows cooperative benefits to emerge in a preemptive system without starvation risks.
Ticket currencies: Each user/group gets its own currency with an exchange rate to global tickets. Department A has 100 local tickets, Department B has 50. The conversion handles proportional fairness automatically without global coordination.
Compensation tickets: If a process yields early (used only 20% of its quantum), it receives a 5x ticket boost for the next lottery. This naturally helps I/O-bound processes — they yield early, get boosted, and respond faster.
Real-Time System Algorithms
Real-time systems don't optimize for speed or fairness — they optimize for meeting deadlines. Missing one can mean system failure, injury, or death.
Foundations
Two critical distinctions:
Type
Deadlines
Consequences of Missing
Soft real-time
Can be missed occasionally
Degraded quality (video stutter, audio glitch)
Hard real-time
Must never be missed
System failure, potential injury/death
Processes in real-time systems carry additional metadata:
Property
Description
Periodic
Occurs at regular intervals (e.g., sensor reading every 10ms)
Given a set of periodic processes, can we even schedule them all? If the total CPU demand exceeds 100%, no algorithm can help — we need to redesign the system.
Try it yourself — edit the values to explore what's schedulable. Before touching the widget, do a quick mental utilization sum and predict whether the answer will be above or below 1.
Schedulability Calculator
Edit the process parameters to explore schedulability. The sum of (Cost / Period) must be ≤ 1.
NameCost (C)Period (P)C / P
0.500
0.150
0.200
Total Utilization85.0%
0%RMS: 78%100%
Schedulable
15.0% wiggle room. Above RMS bound (78%) — only EDF is guaranteed to find a valid schedule.
With those defaults: 50/100 + 30/200 + 100/500 = 0.50 + 0.15 + 0.20 = 0.85. We have 15% wiggle room, so the task set passes this utilization screen.
Rate-Monotonic Scheduling (RMS)
Requirements:
All information known ahead of time (static)
Each periodic process completes within its period
No inter-process dependencies
Constant cost per period
Aperiodic processes have no deadlines
Context-switch overhead is idealized as zero
How it works: Divide cost by period for each process, then assign priority proportional to occurrence frequency. A process occurring 33 times/second gets higher priority than one occurring 20 times/second.
Before reading the RMS chart, track which task has the shortest period. That one should keep winning priority contests.
Real-Time Schedule Chart
Top: process requirements (colored blocks = computation needed, stars = deadlines). Bottom: schedule attempts. Scroll horizontally for longer timelines.
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
105
110
115
Process Requirements
Ac=10 p=30
★
★
★
Bc=10 p=40
★
★
Cc=5 p=50
★
★
Schedule Attempts
✓RMS
A
A
A
A
A
A
A
A
A
A
B
B
B
B
B
B
B
B
B
B
C
C
C
C
C
A
A
A
A
A
A
A
A
A
A
B
B
B
B
B
B
B
B
B
B
C
C
C
C
C
A
A
A
A
A
A
A
A
A
A
B
B
B
B
B
B
B
B
B
B
A
A
A
A
A
A
A
A
A
A
C
C
C
C
C
A
B
C
★Deadline
✓Valid
✗Invalid
In RMS, A has the highest priority (period=30, rate=33/sec), B is next (period=40, rate=25/sec), and C is lowest (period=50, rate=20/sec). A lower-priority process never preempts a higher-priority one — when A's period arrives, it takes precedence over B and C.
The 69.3% Utilization Ceiling
RMS has a provable upper bound on utilization:
$$U_{RMS} = n \cdot (2^{1/n} - 1)$$
As the number of processes $n$ grows, this approaches $\ln(2) \approx 0.693$. Even if the schedulability test says total utilization is 0.85, RMS might not find a valid schedule! Only up to ~69.3% utilization is guaranteed.
📌 Harmonic Period Optimization: If all periods are multiples of each other (e.g., 10ms, 20ms, 40ms), RMS can handle up to 100% utilization. Embedded systems engineers often choose periods to be harmonic specifically for this reason.
Earliest Deadline First (EDF)
EDF's main requirement is that the scheduler knows each job's deadline. It does not require strictly periodic arrivals, though the cleanest theory is usually presented for idealized task models.
Before reading the EDF chart, ignore period names for a moment and watch only the nearest deadline. EDF follows that moving target.
Real-Time Schedule Chart
Top: process requirements (colored blocks = computation needed, stars = deadlines). Bottom: schedule attempts. Scroll horizontally for longer timelines.
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
105
110
115
Process Requirements
Ac=10 p=30
★
★
★
Bc=10 p=40
★
★
Cc=5 p=50
★
★
Schedule Attempts
✓EDF
A
A
A
A
A
A
A
A
A
A
B
B
B
B
B
B
B
B
B
B
C
C
C
C
C
A
A
A
A
A
A
A
A
A
A
B
B
B
B
B
B
B
B
B
B
C
C
C
C
C
A
A
A
A
A
A
A
A
A
A
B
B
B
B
B
B
B
B
B
B
A
A
A
A
A
A
A
A
A
A
C
C
C
C
C
A
B
C
★Deadline
✓Valid
✗Invalid
In EDF, the process with the nearest upcoming deadline always runs. When multiple processes have periods starting at the same time, EDF picks the one whose deadline is soonest. Context switches happen dynamically as new deadlines emerge.
Therefore, RMS and EDF may look similar on friendly examples like this one, but they justify their choices differently: RMS uses fixed priority by period, while EDF recomputes priority from the current deadlines.
The Domino Effect Under Overload
EDF has a dangerous failure mode. When utilization exceeds 100% (overload), EDF can cause a domino effect where ALL deadlines are missed — not just some. The scheduler keeps chasing the nearest deadline, abandoning partially completed work each time a nearer deadline appears.
RMS degrades more gracefully: low-priority tasks miss deadlines first, while high-priority ones are still met.
⚠️ This is why hard real-time systems often prefer RMS despite its lower utilization bound. Predictable failure is safer than catastrophic failure. In aviation and medicine, knowing which tasks will fail is more valuable than maximizing utilization.
RMS vs EDF Comparison
Property
RMS
EDF
Scheduling
Static (fixed priority)
Dynamic (deadline-based)
Max utilization
~69.3% (ln 2)
100%
Overload behavior
Graceful (low-priority fails first)
Catastrophic (domino effect)
Context switches
Fewer
More
Implementation
Simpler
More complex
Best for
Hard real-time, safety-critical
Soft real-time, high utilization
Real-World Schedulers: Linux CFS and Windows
The algorithms above are building blocks. Real-world schedulers combine multiple techniques. Let's look at the two most widely deployed interactive schedulers.
Linux: The Completely Fair Scheduler (CFS)
CFS has been the default Linux scheduler since kernel 2.6.23 (2007), designed by Ingo Molnar. It implements fair-share scheduling using an elegant data structure.
Core Idea: Virtual Runtime
Every process has a virtual runtime (vruntime) — the total CPU time it has received, weighted by its priority. CFS tries to select the process with the lowest vruntime — the one that has received the least CPU time relative to its fair share.
This naturally implements proportional fairness: processes that haven't gotten their share are always prioritized.
The Red-Black Tree
All runnable processes are stored in a red-black tree (a self-balancing BST) keyed by vruntime:
Selecting the next process = picking the leftmost node = O(log n)
Inserting/removing a process = O(log n)
If you've taken a data structures course, this is a direct, critical, real-world application of balanced BSTs.
Loading tree...
Nice Values and Weights
The nice command sets a value from -20 (highest priority) to +19 (lowest priority). Each nice level maps to a weight; lower nice = higher weight = vruntime accumulates slower = gets scheduled more. A nice -20 process gets roughly 82,000x more CPU than a nice +19 process. Well, maybe not actually 82,000x, but it's a lot.
CFS vs the O(1) Scheduler
Before CFS, Linux used the O(1) scheduler (kernels 2.6.0 through 2.6.22). It used bitmaps and per-CPU run queues for O(1) scheduling decisions — no matter how many processes existed.
But O(1) was unfair: interactive processes sometimes got poor response times. CFS traded O(1) for O(log n) but gained true proportional fairness. The lesson: algorithmic complexity isn't the only metric that matters.
cgroups and Container Scheduling
Linux's cgroups (control groups) extend CFS to group processes hierarchically. Docker containers and Kubernetes pods use cgroups to get isolated CPU shares. Each cgroup receives a proportion of CPU time, and CFS schedules within each group.
This is fair-share scheduling deployed at cloud scale — the same theoretical concept from earlier in this lecture, running billions of containers worldwide.
Windows: The Dispatcher
Windows uses a preemptive, priority-based scheduler with 32 levels.
Priority Levels
Range
Class
Use
0
Zero page thread
System idle
1-15
Variable priority
Normal user processes
16-31
Real-time priority
Critical system processes
Dynamic Priority Boosting
Windows dynamically boosts priority for:
Processes whose window is in the foreground (responsiveness to the active app)
Processes that just completed I/O (favoring I/O-bound processes)
Processes that have been starved (anti-starvation aging)
Priority decays back to the base level after each quantum is consumed.
User-Mode Scheduling (UMS)
Windows 7+ allows applications to manage their own thread scheduling in user space — similar to green threads (Tokio from LN6) but at the OS level. Used by SQL Server and other high-performance applications that need tighter control over scheduling than the kernel provides.
CFS vs Windows Comparison
Aspect
Linux CFS
Windows Dispatcher
Core algorithm
Fair-share (vruntime)
Priority-based (32 levels)
Data structure
Red-black tree, O(log n)
Bitmap + linked lists, O(1)
Fairness model
Proportional (weighted by nice)
Priority classes + dynamic boosting
I/O handling
Naturally low vruntime
Explicit I/O completion boost
Container support
cgroups
Job objects
Foreground boost
No special treatment
Active window gets priority boost
Multiprocessor Scheduling
Both Linux and Windows must handle multi-core systems. Three key concepts:
Affinity scheduling: Keep a process on the same core to preserve warm L1/L2 cache. Migrating means cold cache misses and wasted time rebuilding.
Gang scheduling: Schedule all threads of a related process together so they can communicate via shared memory without blocking. Critical for parallel applications.
Work stealing: Idle cores steal tasks from busy cores' run queues. Used by Linux CFS across per-CPU queues, and by Rust's Rayon library (from LN7!).
Beyond Basic Scheduling
Policy vs Mechanism
Up until now, the scheduler works alone. But what if the process provides input to the scheduling decision?
The scheduler maintains an overall policy (e.g., "prioritize short bursts"), parameterized by information each process provides (e.g., estimated runtime). Different processes produce different mechanisms — like upgrading a class from a no-argument constructor to one with parameters that produce varied instances.
Thread Scheduling
We've been scheduling processes, but processes are made of threads. Why not schedule threads instead? Thread scheduling operates at two levels:
User-level: The runtime (like Tokio) schedules green threads cooperatively
Kernel-level: The OS schedules kernel threads preemptively
Both LN6's green threads and the Windows UMS concept are thread-level scheduling in action.
Three-Level Scheduling
Scheduling isn't just a CPU problem. "Zooming out" reveals three levels:
Level
Scheduler
Role
Admission
Long-term
Queue processes from storage to RAM. Admits a good mix of I/O and compute-bound. Can evict processes back to storage if RAM is full.
Memory
Medium-term
Manage the degree of multiprogramming (how many processes in RAM). Balance liveliness, memory footprint, and priority.
CPU
Short-term
What we've been discussing: which ready process gets the CPU next.
The admission scheduler feeds the memory scheduler, which feeds the CPU scheduler. Each level has its own concerns and constraints.
Summary
Algorithm
System
Key Feature
Type
Strengths
Weaknesses
FCFS
Batch
FIFO Queue
Cooperative
Simple, minimal overhead
Convoy effect
SJF
Batch
Priority Queue (runtime)
Cooperative
Optimal turnaround
Requires future knowledge
SRTN
Batch
Dynamic PQ (remaining)
Preemptive
Good turnaround, dynamic
Requires runtime estimates
Round Robin
Interactive
Time Quantum
Preemptive
Fair, responsive
Context switch overhead
Priority
Interactive
Priority Value
Both
Maintains invariants
Starvation risk
MLFQ
Interactive
Multi-Queue + Demotion
Preemptive
Auto-separates process types
Complex to tune
SPN
Interactive
CPU History / Burst Estimation
Usually non-preemptive
Practical shortest-first approximation
Prediction error
Guaranteed / Fair-Share
Interactive
Promises + User Roles
Both
Policy-driven, composable
Requires monitoring
Lottery
Interactive
Tickets
Preemptive
Analyzable CPU share
Probabilistic
RMS
Real-Time
Period Frequency
Static
Optimal fixed-priority
69.3% ceiling
EDF
Real-Time
Earliest Deadline
Dynamic
100% utilization
Domino effect
CFS
Real-World
Virtual Runtime (RBT)
Preemptive
True proportional fairness
O(log n) not O(1)
📝 Lecture Notes
Key Definitions:
Term
Definition
FCFS
First Come First Serve — FIFO queue, simple but convoy-prone
SJF
Shortest Job First — optimal turnaround, requires future knowledge
SRTN
Shortest Remaining Time Next — dynamic preemptive SJF
Round Robin
Fixed time quantum, fair but overhead tradeoff
MLFQ
Multi-Level Feedback Queue — auto-separates process types through behavior