CMSI 3510: Operating Systems - Cheat Sheet
CPU Architecture
Key Components
- Control Unit (CU): Directs operations of the processor, fetches instructions from memory, decodes them, and coordinates execution
- Arithmetic Logic Unit (ALU): Performs arithmetic and logical operations
- Registers: Small, fast storage locations within the CPU
- Program Counter (PC): Holds address of next instruction
- Instruction Register (IR): Holds current instruction being executed
- Stack Pointer (SP): Points to top of the stack
- General Purpose Registers: Store operands and intermediate results
- Cache Memory: Fast memory between CPU and main memory
- L1 Cache: Smallest, fastest (~32-64 KB per core, ~1-2ns access time)
- L2 Cache: Medium size (~256 KB - 1 MB per core, ~4-10ns)
- L3 Cache: Largest, shared across cores (~8-32 MB, ~10-20ns)
Instruction Cycle (Fetch-Decode-Execute)
1. FETCH: PC β MAR β Memory β MDR β IR
PC = PC + 1
2. DECODE: IR β Control Unit (decode instruction)
3. EXECUTE: Perform operation (ALU, memory access, etc.)
4. STORE: Write results back to registers/memoryMAR: Memory Address Register | MDR: Memory Data Register
CPU Modes
- User Mode: Limited privileges, cannot execute privileged instructions, used for user applications
- Kernel Mode (Supervisor Mode): Full privileges, can execute all instructions, access all memory, control hardware
- Mode Switching: Triggered by system calls, interrupts, or exceptions
Interrupts
- Hardware Interrupts: Generated by hardware devices (I/O completion, timer, etc.)
- Software Interrupts (Traps): Generated by software (system calls, exceptions)
- Interrupt Handling:
- Save current state (PC, registers)
- Load interrupt handler address from Interrupt Vector Table (IVT)
- Execute interrupt handler in kernel mode
- Restore saved state and return to user mode
Pipelining
CPU technique to increase instruction throughput by overlapping execution stages
Classic 5-Stage Pipeline: 1. IF - Instruction Fetch 2. ID - Instruction Decode 3. EX - Execute 4. MEM - Memory Access 5. WB - Write Back Time | IF | ID | EX | MEM | WB | ------|----|----|----|----|-----| T1 | I1 | | | | | T2 | I2 | I1 | | | | T3 | I3 | I2 | I1 | | | T4 | I4 | I3 | I2 | I1 | | T5 | I5 | I4 | I3 | I2 | I1 |
Hazards: Data hazards, control hazards (branches), structural hazards
Multi-Core & Multiprocessing
- Symmetric Multiprocessing (SMP): Multiple identical processors share memory and I/O
- Multi-Core: Multiple CPU cores on a single chip
- Hyper-Threading: Single physical core appears as multiple logical cores
- Cache Coherence: Ensures consistency when multiple cores cache same memory location (MESI protocol: Modified, Exclusive, Shared, Invalid)
Memory (RAM)
Memory Hierarchy
Fastest/Smallest Slowest/Largest βββββββββββββββ β Registers β <1 ns, bytes to KB βββββββββββββββ€ β L1 Cache β 1-2 ns, ~32-64 KB βββββββββββββββ€ β L2 Cache β 4-10 ns, ~256 KB - 1 MB βββββββββββββββ€ β L3 Cache β 10-20 ns, ~8-32 MB βββββββββββββββ€ β Main RAM β 50-100 ns, GBs βββββββββββββββ€ β SSD β 25-100 ΞΌs, 100s GB - TBs βββββββββββββββ€ β HDD β 5-10 ms, TBs βββββββββββββββ
Memory Types
- DRAM (Dynamic RAM): Main memory, requires periodic refresh, slower but cheaper
- SRAM (Static RAM): Used for cache, faster, doesn't need refresh, more expensive
- DDR SDRAM: Double Data Rate, transfers on both clock edges (DDR3, DDR4, DDR5)
Virtual Memory
- Purpose: Abstraction that provides each process with illusion of large, contiguous address space
- Benefits: Process isolation, efficient memory use, allows programs larger than physical RAM
- Translation: Virtual addresses β Physical addresses via page tables
- Translation Lookaside Buffer (TLB): Cache for page table entries, speeds up address translation
Paging
Memory management scheme that divides memory into fixed-size blocks
- Page: Fixed-size block in virtual memory (typically 4 KB)
- Frame: Fixed-size block in physical memory (same size as page)
- Page Table: Maps virtual pages to physical frames
- Page Fault: Occurs when accessing page not in physical memory
- Trap to OS kernel
- Save process state
- Find page on disk
- Load page into free frame
- Update page table
- Resume process
Virtual Address Format: ββββββββββββββββ¬βββββββββββββββ β Page Number β Offset β ββββββββββββββββ΄βββββββββββββββ Page Table Entry (PTE): βββββββ¬ββββββ¬ββββ¬ββββ¬βββββββββββββββ βValidβDirtyβR/WβU/Kβ Frame Number β βββββββ΄ββββββ΄ββββ΄ββββ΄βββββββββββββββ
Page Replacement Algorithms
- FIFO (First-In-First-Out): Replace oldest page, can suffer from Belady's anomaly
- LRU (Least Recently Used): Replace page not used for longest time, good performance but expensive
- LFU (Least Frequently Used): Replace page with lowest reference count
- Clock (Second Chance): Approximates LRU, uses reference bit, circular list
- Optimal: Replace page that won't be used for longest time (theoretical, requires future knowledge)
Segmentation
- Segments: Variable-size logical units (code, data, stack, heap)
- Segment Table: Maps segment numbers to base address and limit
- Segmentation + Paging: Combine benefits of both (e.g., x86-64 architecture)
- Fragmentation: External fragmentation possible with pure segmentation
Memory Allocation
- Contiguous Allocation:
- First Fit: Allocate first hole that's big enough
- Best Fit: Allocate smallest hole that fits
- Worst Fit: Allocate largest hole
- Non-Contiguous Allocation: Paging and segmentation allow non-contiguous allocation
- Fragmentation:
- External: Free memory scattered, can't satisfy requests
- Internal: Allocated memory larger than needed, waste within allocation unit
Storage Systems
Hard Disk Drives (HDD)
- Components:
- Platters: Rotating magnetic disks
- Spindle: Rotates platters (5400-15000 RPM)
- Read/Write Heads: Move across platter surface on actuator arm
- Tracks: Concentric circles on platter
- Sectors: Arc segments of tracks (typically 512B or 4KB)
- Cylinders: Set of tracks at same position on all platters
- Access Time = Seek Time + Rotational Latency + Transfer Time
- Seek Time: Time to move head to correct track (3-12 ms)
- Rotational Latency: Time for sector to rotate under head (avg = 0.5 Γ rotation time)
- Transfer Time: Time to read/write data
- Typical Performance: 80-160 MB/s throughput, 5-10 ms latency
Disk Scheduling Algorithms
- FCFS (First-Come-First-Served): Process requests in order, no optimization
- SSTF (Shortest Seek Time First): Service request closest to current head position, can starve distant requests
- SCAN (Elevator): Head moves in one direction, services requests, then reverses
- C-SCAN (Circular SCAN): Like SCAN but only services in one direction, returns to start without servicing
- LOOK/C-LOOK: Like SCAN/C-SCAN but only goes as far as last request in each direction
Example: Current head at 50, queue: [98, 183, 37, 122, 14, 124, 65, 67] FCFS: 50β98β183β37β122β14β124β65β67 (total: 640) SSTF: 50β65β67β37β14β98β122β124β183 (total: 236) SCAN: 50β37β14β0β65β67β98β122β124β183 (total: 236)
Solid State Drives (SSD)
- Technology: Flash memory (NAND), no moving parts
- Structure:
- Pages: Smallest read/write unit (4-16 KB)
- Blocks: Smallest erase unit (128-256 pages, ~512 KB - 4 MB)
- Planes/Dies: Organized for parallel access
- Performance: 200-3500 MB/s (SATA) or 3500-7000 MB/s (NVMe), 25-100 ΞΌs latency
- Characteristics:
- Must erase before write (block-level erase)
- Limited write cycles (~1000-100000 per cell)
- Write amplification issue
- Garbage collection needed
- Flash Translation Layer (FTL): Maps logical blocks to physical pages, handles wear leveling, garbage collection
RAID (Redundant Array of Independent Disks)
- RAID 0 (Striping): Data split across disks, no redundancy, high performance
- RAID 1 (Mirroring): Data duplicated on multiple disks, 100% overhead, high read performance
- RAID 5 (Striping + Parity): Data + parity striped across β₯3 disks, can survive 1 disk failure, (n-1)/n capacity
- RAID 6 (Double Parity): Like RAID 5 but 2 parity blocks, survives 2 disk failures, (n-2)/n capacity
- RAID 10 (1+0): Mirrored pairs striped, combines benefits of RAID 0 and 1, 50% capacity
File System Basics
- File: Named collection of related information stored on secondary storage
- Directory: Organization structure containing files and subdirectories
- Metadata: File name, size, timestamps, permissions, owner, location
- Inode (Unix/Linux): Data structure storing file metadata and pointers to data blocks
- File Allocation Methods:
- Contiguous: Consecutive blocks, fast but fragmentation
- Linked: Each block points to next, no fragmentation, slow random access
- Indexed: Index block contains pointers to all data blocks
- Multi-level Indexed: Combines direct, indirect, double indirect pointers (ext2/3/4)
Common File Systems
- ext4: Linux default, journaling, extent-based allocation, up to 1 EB volume
- NTFS: Windows, journaling, ACLs, compression, encryption
- APFS: macOS/iOS, optimized for SSDs, snapshots, encryption
- FAT32: Simple, widely compatible, 4 GB file limit
- Btrfs/ZFS: Advanced features: snapshots, checksums, RAID, compression
I/O Systems
I/O Hardware Components
- I/O Devices: Keyboard, mouse, display, disk, network card, etc.
- Device Controller: Hardware interface between device and computer
- Contains registers (status, command, data)
- May have local buffer
- Communicates via I/O ports or memory-mapped I/O
- Device Driver: OS software module that interfaces with device controller
- I/O Buses:
- System Bus: CPU to memory
- PCIe (Peripheral Component Interconnect Express): High-speed expansion bus
- USB (Universal Serial Bus): External peripheral connection
- SATA/NVMe: Storage connections
I/O Methods
- Programmed I/O (Polling):
- CPU continuously checks device status
- Simple but CPU-intensive, wasteful
- Busy waiting
- Interrupt-Driven I/O:
- Device signals CPU when ready via interrupt
- CPU can do other work while waiting
- Still involves CPU for data transfer
- Better CPU utilization than polling
- Direct Memory Access (DMA):
- Dedicated DMA controller handles data transfer
- CPU only involved at start and completion
- Best for large transfers
- Single interrupt after entire transfer complete
DMA Transfer Process: 1. CPU programs DMA controller (source, dest, size) 2. DMA controller takes control of bus 3. DMA transfers data directly between device & memory 4. DMA sends interrupt when complete 5. CPU handles interrupt, transfer finished
I/O Software Layers
βββββββββββββββββββββββββββββββ β User-Level I/O Software β (printf, scanf, etc.) βββββββββββββββββββββββββββββββ€ β Device-Independent OS β (buffering, naming, error) βββββββββββββββββββββββββββββββ€ β Device Drivers β (device-specific code) βββββββββββββββββββββββββββββββ€ β Interrupt Handlers β (process interrupts) βββββββββββββββββββββββββββββββ€ β Hardware β (controllers, devices) βββββββββββββββββββββββββββββββ
- User Level: Libraries and system calls (read, write, open, close)
- Device-Independent: Uniform naming, buffering, error handling, allocation
- Device Driver: Device-specific commands, initialization, status checks
- Interrupt Handler: Save context, handle interrupt, restore context
Buffering
- Purpose: Cope with speed mismatch, allow asynchronous operation, support copy semantics
- Single Buffering: OS buffer, process waits for I/O completion
- Double Buffering: Two buffers alternate (fill one while processing other)
- Circular Buffering: Ring of buffers for producer-consumer scenarios
- Buffer Cache: Keep frequently accessed blocks in memory (page cache in Linux)
I/O Performance
- Reduce number of context switches
- Reduce data copying: Zero-copy techniques, DMA
- Reduce interrupt frequency: Interrupt coalescing, polling under high load
- Use DMA for large transfers
- Balance CPU and I/O operations
- Asynchronous I/O: Allow overlapping computation and I/O
- I/O Scheduling: Reorder requests for efficiency (disk scheduling algorithms)
Memory-Mapped I/O vs Port-Mapped I/O
- Memory-Mapped I/O (MMIO):
- Device registers mapped into memory address space
- Use regular load/store instructions
- Single address space for memory and I/O
- Common in modern systems
- Port-Mapped I/O (PMIO):
- Separate address space for I/O devices
- Special I/O instructions (IN/OUT on x86)
- Better isolation between memory and I/O
- Legacy approach on x86
Block vs Character Devices
- Block Devices:
- Store information in fixed-size blocks (typically 512B - 4KB)
- Random access possible
- Can be buffered/cached
- Examples: Hard disks, SSDs, USB drives
- Character Devices:
- Stream of characters, no block structure
- Sequential access, not seekable
- Examples: Keyboards, mice, serial ports, terminals
- Network Devices: Often handled separately with socket interface
Quick Reference Summary
- CPU: Control Unit + ALU + Registers + Cache | Modes: User/Kernel | Interrupts & System Calls
- Memory: Hierarchy: Registers β Cache (L1/L2/L3) β RAM β Disk | Virtual Memory + Paging | TLB
- Storage: HDD (mechanical, 5-10ms) vs SSD (flash, 25-100ΞΌs) | RAID for redundancy | File Systems
- I/O: Polling β Interrupts β DMA | Device Drivers | Buffering | Block vs Character Devices