Julian Gonzalez

Overview

Welcome to your first hands-on activity! You'll experience data parallelism (batching) and task parallelism (pipelining) by racing to fold origami figures as quickly as possible.

Data Parallelism: Everyone follows all steps individually (like multiple threads executing the same code on different data)

Task Parallelism: Each person handles one step and passes to the next (like an assembly line or pipeline)

Which approach is faster? You'll find out!

Learning Objectives

Understand data parallelism (batching) vs task parallelism (pipelining)
Experience load balancing and coordination challenges in parallel systems
Identify trade-offs between different parallelization strategies
Apply partitioning concepts to concurrent programming

Materials Needed

Printer paper (1-2 reams)
Scissors (enough for half the class)
Origami instructions (Sanbou-box and Tulip)
Timer
Trash cans for failed attempts

Origami Instructions

You'll need these two origami designs:

Sanbou-box Instructions - A traditional Japanese box
Tulip Instructions - A simple flower design

Pro Tip: Print these instructions or have students access them on mobile devices. Each student should be able to reference them during the activity.

Activity Structure (4 Stages)

This activity consists of 4 competitive rounds (20 minutes each):

Stage 1: Data Parallel vs Task Parallel (Sanbou-box)
Stage 2: Same teams swap roles (Sanbou-box)
Stage 3: Data Parallel vs Task Parallel (Tulip)
Stage 4: Same teams swap roles (Tulip)

Optional Stage 5: Free-form - combine techniques however you want!

Setup Instructions (For Instructor)

Divide the class in half - Team 1 and Team 2 (split by classroom location)
Assign roles for Stage 1:
- One team: Data Parallel
- Other team: Task Parallel
Choose the first origami design (Sanbou-box or Tulip)
Explain the rules for each team type (see below)
Start the 20-minute timer and let students compete!

After each stage, teams swap roles and/or switch to a new design.

Data Parallel Team Rules

Your Goal: Each person folds complete origami figures individually (like threads executing identical code on different data)

You MUST:

Work individually from start to finish on each figure
Cut your own paper (make it square)
Follow all steps in order - no skipping!
Turn in completed figures to the instructor for counting

You CANNOT:

Have teammates physically help with your figure
Skip steps or cut corners (literally or metaphorically!)
Submit incomplete or rushed figures (they may be disqualified)

You CAN:

Talk, ask for help, and encourage teammates
Move around and have fun!
Restart if you make a mistake

Think About It: This is like running the same program on multiple CPUs, each processing different data. What happens if some "threads" (people) are faster than others?

Task Parallel Team Rules

Your Goal: Create an assembly line where each person handles ONE specific step (like a pipeline)

You MUST:

Be assigned to specific step(s) in the instructions
ONLY do your assigned step(s)
Pass your work to the next person in the pipeline
Cover all steps (assign multiple people to slow steps if needed)

You CANNOT:

Help with someone else's step
Skip any instructions in the sequence
Have a "quality control" person fix mistakes with scissors
Submit incomplete or rushed figures (they may be disqualified)

You CAN:

Have multiple people doing the same step (load balancing!)
Have one person handle multiple steps if there aren't enough people
Designate someone to cut paper (they're the "data source")
Talk, coordinate, and have fun!

Think About It: This is like a pipeline in a CPU or assembly line in a factory. What happens if one step is much slower than the others? How do you balance the workload?

Discussion Questions

After completing all stages, discuss as a class:

Q1: What was the most difficult part of the Data Parallel approach?

Q2: What was the most difficult part of the Task Parallel approach?

Q3: What would the "perfect" origami instructions look like for Data Parallelism to be most efficient? Hint: Think about number of steps, complexity, and individual skill variation.

Q4: What would the "perfect" origami instructions look like for Task Parallelism to be most efficient? Hint: Think about step complexity, dependencies, and load balancing.

Q5: How do the difficulties you encountered map to real challenges when programming with threads? Consider: coordination, load balancing, bottlenecks, and communication overhead.

Q6: If you could design a hybrid approach (Stage 5), what would it look like? How would you combine data and task parallelism?

Key Takeaways

Data Parallelism (Batching):

✅ Simple to coordinate (everyone does the same thing)
✅ Scales with more "threads" (people)
❌ Limited by individual skill variation
❌ No specialization benefits

Task Parallelism (Pipelining):

✅ Can optimize each step individually
✅ Enables specialization (people get good at their step)
❌ Limited by the slowest step (bottleneck)
❌ Requires coordination and handoffs

In Real Programming:

Most systems use both techniques
Rust's concurrency model helps you safely implement both
Understanding trade-offs helps you choose the right approach

Grading

Participation-based - No winner or loser! Just engage, try your best, and participate in the discussion.

Alternate Assignment (For Absences)

Couldn't make it to class? You can make up the points by completing this alternate assignment!

Watch These Videos

Watch these videos about parallel computing approaches. Keep the data parallel vs task parallel distinction in mind throughout:

AMD Simplified: Serial vs. Parallel Computing - A clear introduction to why parallelism matters and how it speeds up computation
CPU Pipeline - Computerphile - Matt Godbolt explains pipelining (task parallelism) in CPU architecture

While watching, think about:

Which approach is like everyone doing the same task on different data? (Data Parallel)
Which approach is like an assembly line where each person does one step? (Task Parallel)

Answer These Questions

Submit a PDF with your answers to Brightspace:

Q1: In your own words, define data parallelism and task parallelism . Provide a real-world (non-computing) example of each. Hint: Think about factories, restaurants, or other production systems.

Q2: For each of the following scenarios, identify whether data parallelism or task parallelism would be more appropriate. Justify your answer: a) Processing 10,000 images through the same filter b) A web server handling requests where each request requires authentication, database lookup, and response formatting c) Training a neural network on a large dataset d) A video editing pipeline: decode \to apply effects \to encode \to write to disk

Q3: In the activity, task parallel teams sometimes had a "bottleneck" where one slow step held up the entire pipeline. Explain why this happens and propose two strategies a team could use to address it. Connect this to the concept of load balancing in parallel systems.

Q4: Amdahl's Law states that the speedup from parallelism is limited by the sequential portion of a program. If a program is 80% parallelizable and 20% must run sequentially, what is the maximum theoretical speedup with infinite processors?Show your reasoning. Formula: Speedup = 1 / ((1 - P) + P/N) where P is the parallel fraction and N is the number of processors.

Q5: The Computerphile video discusses CPU pipelining. Explain what a pipeline stall is and why it reduces the benefits of pipelining. How does this relate to the challenges a task parallel team might face in the origami activity?

Q6: Consider the trade-offs between data parallelism and task parallelism: Factor Data Parallelism Task Parallelism Coordination overhead ? ? Sensitivity to individual speed variation ? ? Ability to specialize ? ? Scalability with more workers ? ? Fill in each cell with "Low", "Medium", or "High" and briefly explain your reasoning for each row.

Factor	Data Parallelism	Task Parallelism
Coordination overhead	?	?
Sensitivity to individual speed variation	?	?
Ability to specialize	?	?
Scalability with more workers	?	?

Q7: Many real systems use hybrid approaches that combine data and task parallelism. Describe a system (real or hypothetical) that uses both. Explain which parts use data parallelism, which use task parallelism, and why this combination is beneficial. Example domains: video streaming, web applications, scientific simulations, game engines.

Q8: Create an analogy to explain the difference between data parallelism and task parallelism to someone with no technical background. Your analogy should clearly illustrate: How work is divided differently in each approach The main advantage of each approach The main limitation of each approach

Submission: Upload your PDF to the relevant Brightspace assignment.

Loading content...

Course Planner

Spring Break!

AC 2: Saved by The Schedule

HW 3: Hot Pie-tato

AC 1: Getting Folded

Learning Objectives

Overview

Learning Objectives

Materials Needed

Origami Instructions

Activity Structure (4 Stages)

Setup Instructions (For Instructor)

Data Parallel Team Rules

You MUST:

You CANNOT:

You CAN:

Task Parallel Team Rules

You MUST:

You CANNOT:

You CAN:

Discussion Questions

Key Takeaways

Grading

Alternate Assignment (For Absences)

Watch These Videos

Answer These Questions

All Activitys

Overview

Learning Objectives

Materials Needed

Origami Instructions

Activity Structure (4 Stages)

Setup Instructions (For Instructor)

Data Parallel Team Rules

You MUST:

You CANNOT:

You CAN:

Task Parallel Team Rules

You MUST:

You CANNOT:

You CAN:

Discussion Questions

Key Takeaways

Grading

Alternate Assignment (For Absences)

Watch These Videos

Answer These Questions