Julian Gonzalez

CPU Architecture

Key Components

Control Unit (CU): Directs operations of the processor, fetches instructions from memory, decodes them, and coordinates execution
Arithmetic Logic Unit (ALU): Performs arithmetic and logical operations
Registers: Small, fast storage locations within the CPU
- Program Counter (PC): Holds address of next instruction
- Instruction Register (IR): Holds current instruction being executed
- Stack Pointer (SP): Points to top of the stack
- General Purpose Registers: Store operands and intermediate results
Cache Memory: Fast memory between CPU and main memory
- L1 Cache: Smallest, fastest (~32-64 KB per core, ~1-2ns access time)
- L2 Cache: Medium size (~256 KB - 1 MB per core, ~4-10ns)
- L3 Cache: Largest, shared across cores (~8-32 MB, ~10-20ns)

Instruction Cycle (Fetch-Decode-Execute)

1. FETCH:    PC → MAR → Memory → MDR → IR
             PC = PC + 1
2. DECODE:   IR → Control Unit (decode instruction)
3. EXECUTE:  Perform operation (ALU, memory access, etc.)
4. STORE:    Write results back to registers/memory

MAR: Memory Address Register | MDR: Memory Data Register

CPU Modes

User Mode: Limited privileges, cannot execute privileged instructions, used for user applications
Kernel Mode (Supervisor Mode): Full privileges, can execute all instructions, access all memory, control hardware
Mode Switching: Triggered by system calls, interrupts, or exceptions

Interrupts

Hardware Interrupts: Generated by hardware devices (I/O completion, timer, etc.)
Software Interrupts (Traps): Generated by software (system calls, exceptions)
Interrupt Handling:
1. Save current state (PC, registers)
2. Load interrupt handler address from Interrupt Vector Table (IVT)
3. Execute interrupt handler in kernel mode
4. Restore saved state and return to user mode

Pipelining

CPU technique to increase instruction throughput by overlapping execution stages

Classic 5-Stage Pipeline:
1. IF  - Instruction Fetch
2. ID  - Instruction Decode
3. EX  - Execute
4. MEM - Memory Access
5. WB  - Write Back

Time  | IF | ID | EX | MEM | WB |
------|----|----|----|----|-----|
T1    | I1 |    |    |    |     |
T2    | I2 | I1 |    |    |     |
T3    | I3 | I2 | I1 |    |     |
T4    | I4 | I3 | I2 | I1 |     |
T5    | I5 | I4 | I3 | I2 | I1  |

Hazards: Data hazards, control hazards (branches), structural hazards

Multi-Core & Multiprocessing

Symmetric Multiprocessing (SMP): Multiple identical processors share memory and I/O
Multi-Core: Multiple CPU cores on a single chip
Hyper-Threading: Single physical core appears as multiple logical cores
Cache Coherence: Ensures consistency when multiple cores cache same memory location (MESI protocol: Modified, Exclusive, Shared, Invalid)

Memory (RAM)

Memory Hierarchy

Fastest/Smallest                    Slowest/Largest
┌─────────────┐
│  Registers  │ <1 ns, bytes to KB
├─────────────┤
│  L1 Cache   │ 1-2 ns, ~32-64 KB
├─────────────┤
│  L2 Cache   │ 4-10 ns, ~256 KB - 1 MB
├─────────────┤
│  L3 Cache   │ 10-20 ns, ~8-32 MB
├─────────────┤
│  Main RAM   │ 50-100 ns, GBs
├─────────────┤
│  SSD        │ 25-100 μs, 100s GB - TBs
├─────────────┤
│  HDD        │ 5-10 ms, TBs
└─────────────┘

Memory Types

DRAM (Dynamic RAM): Main memory, requires periodic refresh, slower but cheaper
SRAM (Static RAM): Used for cache, faster, doesn't need refresh, more expensive
DDR SDRAM: Double Data Rate, transfers on both clock edges (DDR3, DDR4, DDR5)

Virtual Memory

Purpose: Abstraction that provides each process with illusion of large, contiguous address space
Benefits: Process isolation, efficient memory use, allows programs larger than physical RAM
Translation: Virtual addresses → Physical addresses via page tables
Translation Lookaside Buffer (TLB): Cache for page table entries, speeds up address translation

Paging

Memory management scheme that divides memory into fixed-size blocks

Page: Fixed-size block in virtual memory (typically 4 KB)
Frame: Fixed-size block in physical memory (same size as page)
Page Table: Maps virtual pages to physical frames
Page Fault: Occurs when accessing page not in physical memory
1. Trap to OS kernel
2. Save process state
3. Find page on disk
4. Load page into free frame
5. Update page table
6. Resume process

Virtual Address Format:
┌──────────────┬──────────────┐
│ Page Number  │    Offset    │
└──────────────┴──────────────┘

Page Table Entry (PTE):
┌─────┬─────┬───┬───┬──────────────┐
│Valid│Dirty│R/W│U/K│ Frame Number │
└─────┴─────┴───┴───┴──────────────┘

Page Replacement Algorithms

FIFO (First-In-First-Out): Replace oldest page, can suffer from Belady's anomaly
LRU (Least Recently Used): Replace page not used for longest time, good performance but expensive
LFU (Least Frequently Used): Replace page with lowest reference count
Clock (Second Chance): Approximates LRU, uses reference bit, circular list
Optimal: Replace page that won't be used for longest time (theoretical, requires future knowledge)

Segmentation

Segments: Variable-size logical units (code, data, stack, heap)
Segment Table: Maps segment numbers to base address and limit
Segmentation + Paging: Combine benefits of both (e.g., x86-64 architecture)
Fragmentation: External fragmentation possible with pure segmentation

Memory Allocation

Contiguous Allocation:
- First Fit: Allocate first hole that's big enough
- Best Fit: Allocate smallest hole that fits
- Worst Fit: Allocate largest hole
Non-Contiguous Allocation: Paging and segmentation allow non-contiguous allocation
Fragmentation:
- External: Free memory scattered, can't satisfy requests
- Internal: Allocated memory larger than needed, waste within allocation unit

Storage Systems

Hard Disk Drives (HDD)

Components:
- Platters: Rotating magnetic disks
- Spindle: Rotates platters (5400-15000 RPM)
- Read/Write Heads: Move across platter surface on actuator arm
- Tracks: Concentric circles on platter
- Sectors: Arc segments of tracks (typically 512B or 4KB)
- Cylinders: Set of tracks at same position on all platters
Access Time = Seek Time + Rotational Latency + Transfer Time
- Seek Time: Time to move head to correct track (3-12 ms)
- Rotational Latency: Time for sector to rotate under head (avg = 0.5 × rotation time)
- Transfer Time: Time to read/write data
Typical Performance: 80-160 MB/s throughput, 5-10 ms latency

Disk Scheduling Algorithms

FCFS (First-Come-First-Served): Process requests in order, no optimization
SSTF (Shortest Seek Time First): Service request closest to current head position, can starve distant requests
SCAN (Elevator): Head moves in one direction, services requests, then reverses
C-SCAN (Circular SCAN): Like SCAN but only services in one direction, returns to start without servicing
LOOK/C-LOOK: Like SCAN/C-SCAN but only goes as far as last request in each direction

Example: Current head at 50, queue: [98, 183, 37, 122, 14, 124, 65, 67]

FCFS: 50→98→183→37→122→14→124→65→67  (total: 640)
SSTF: 50→65→67→37→14→98→122→124→183  (total: 236)
SCAN: 50→37→14→0→65→67→98→122→124→183 (total: 236)

Solid State Drives (SSD)

Technology: Flash memory (NAND), no moving parts
Structure:
- Pages: Smallest read/write unit (4-16 KB)
- Blocks: Smallest erase unit (128-256 pages, ~512 KB - 4 MB)
- Planes/Dies: Organized for parallel access
Performance: 200-3500 MB/s (SATA) or 3500-7000 MB/s (NVMe), 25-100 μs latency
Characteristics:
- Must erase before write (block-level erase)
- Limited write cycles (~1000-100000 per cell)
- Write amplification issue
- Garbage collection needed
Flash Translation Layer (FTL): Maps logical blocks to physical pages, handles wear leveling, garbage collection

RAID (Redundant Array of Independent Disks)

RAID 0 (Striping): Data split across disks, no redundancy, high performance
RAID 1 (Mirroring): Data duplicated on multiple disks, 100% overhead, high read performance
RAID 5 (Striping + Parity): Data + parity striped across ≥3 disks, can survive 1 disk failure, (n-1)/n capacity
RAID 6 (Double Parity): Like RAID 5 but 2 parity blocks, survives 2 disk failures, (n-2)/n capacity
RAID 10 (1+0): Mirrored pairs striped, combines benefits of RAID 0 and 1, 50% capacity

File System Basics

File: Named collection of related information stored on secondary storage
Directory: Organization structure containing files and subdirectories
Metadata: File name, size, timestamps, permissions, owner, location
Inode (Unix/Linux): Data structure storing file metadata and pointers to data blocks
File Allocation Methods:
- Contiguous: Consecutive blocks, fast but fragmentation
- Linked: Each block points to next, no fragmentation, slow random access
- Indexed: Index block contains pointers to all data blocks
- Multi-level Indexed: Combines direct, indirect, double indirect pointers (ext2/3/4)

Common File Systems

ext4: Linux default, journaling, extent-based allocation, up to 1 EB volume
NTFS: Windows, journaling, ACLs, compression, encryption
APFS: macOS/iOS, optimized for SSDs, snapshots, encryption
FAT32: Simple, widely compatible, 4 GB file limit
Btrfs/ZFS: Advanced features: snapshots, checksums, RAID, compression

I/O Systems

I/O Hardware Components

I/O Devices: Keyboard, mouse, display, disk, network card, etc.
Device Controller: Hardware interface between device and computer
- Contains registers (status, command, data)
- May have local buffer
- Communicates via I/O ports or memory-mapped I/O
Device Driver: OS software module that interfaces with device controller
I/O Buses:
- System Bus: CPU to memory
- PCIe (Peripheral Component Interconnect Express): High-speed expansion bus
- USB (Universal Serial Bus): External peripheral connection
- SATA/NVMe: Storage connections

I/O Methods

Programmed I/O (Polling):
- CPU continuously checks device status
- Simple but CPU-intensive, wasteful
- Busy waiting
Interrupt-Driven I/O:
- Device signals CPU when ready via interrupt
- CPU can do other work while waiting
- Still involves CPU for data transfer
- Better CPU utilization than polling
Direct Memory Access (DMA):
- Dedicated DMA controller handles data transfer
- CPU only involved at start and completion
- Best for large transfers
- Single interrupt after entire transfer complete

DMA Transfer Process:
1. CPU programs DMA controller (source, dest, size)
2. DMA controller takes control of bus
3. DMA transfers data directly between device & memory
4. DMA sends interrupt when complete
5. CPU handles interrupt, transfer finished

I/O Software Layers

┌─────────────────────────────┐
│  User-Level I/O Software    │ (printf, scanf, etc.)
├─────────────────────────────┤
│  Device-Independent OS      │ (buffering, naming, error)
├─────────────────────────────┤
│  Device Drivers             │ (device-specific code)
├─────────────────────────────┤
│  Interrupt Handlers         │ (process interrupts)
├─────────────────────────────┤
│  Hardware                   │ (controllers, devices)
└─────────────────────────────┘

User Level: Libraries and system calls (read, write, open, close)
Device-Independent: Uniform naming, buffering, error handling, allocation
Device Driver: Device-specific commands, initialization, status checks
Interrupt Handler: Save context, handle interrupt, restore context

Buffering

Purpose: Cope with speed mismatch, allow asynchronous operation, support copy semantics
Single Buffering: OS buffer, process waits for I/O completion
Double Buffering: Two buffers alternate (fill one while processing other)
Circular Buffering: Ring of buffers for producer-consumer scenarios
Buffer Cache: Keep frequently accessed blocks in memory (page cache in Linux)

I/O Performance

Reduce number of context switches
Reduce data copying: Zero-copy techniques, DMA
Reduce interrupt frequency: Interrupt coalescing, polling under high load
Use DMA for large transfers
Balance CPU and I/O operations
Asynchronous I/O: Allow overlapping computation and I/O
I/O Scheduling: Reorder requests for efficiency (disk scheduling algorithms)

Memory-Mapped I/O vs Port-Mapped I/O

Memory-Mapped I/O (MMIO):
- Device registers mapped into memory address space
- Use regular load/store instructions
- Single address space for memory and I/O
- Common in modern systems
Port-Mapped I/O (PMIO):
- Separate address space for I/O devices
- Special I/O instructions (IN/OUT on x86)
- Better isolation between memory and I/O
- Legacy approach on x86

Block vs Character Devices

Block Devices:
- Store information in fixed-size blocks (typically 512B - 4KB)
- Random access possible
- Can be buffered/cached
- Examples: Hard disks, SSDs, USB drives
Character Devices:
- Stream of characters, no block structure
- Sequential access, not seekable
- Examples: Keyboards, mice, serial ports, terminals
Network Devices: Often handled separately with socket interface

Quick Reference Summary

CPU: Control Unit + ALU + Registers + Cache | Modes: User/Kernel | Interrupts & System Calls
Memory: Hierarchy: Registers → Cache (L1/L2/L3) → RAM → Disk | Virtual Memory + Paging | TLB
Storage: HDD (mechanical, 5-10ms) vs SSD (flash, 25-100μs) | RAID for redundancy | File Systems
I/O: Polling → Interrupts → DMA | Device Drivers | Buffering | Block vs Character Devices

Course Planner

LN 17: Hindsight's on Port 2020

CMSI 3510: Operating Systems - Cheat Sheet