Sunday, August 23, 2015

NSF Workshop on Parallel and Distributed Computing Education

This week I participated in an NSF Workshop on Broadening Parallel and Distributed Computing Undergraduate Education organized by the Center for Parallel and Distributed Computing Curriculum Development and Educational Resources (CDER).  Check out the EduHPC workshop at Supercomputing if you are interested in participating in this topic.

The Problem: This workshop was focused on the lack of parallel and distributed computing (PDC) coverage in required Computer Science (CS) courses even though all of the machines, including cell phones, have multiple threads of execution available and many of the applications we use every day are distributed (Facebook, Minecraft, GoogleDocs, etc).  CS students form the core of people who will be maintaining and programming parallel and distributed systems, and the majority of them are not taking parallel and distributed electives.

Opportunities: I see two main opportunities: (1) introducing parallel and distributed exemplars into the first two years of a CS degree could help motivate and retain a more diverse population and (2) there may be some research funding opportunities.

Motivating all CS concepts with interesting problems at the entry levels is crucial to retaining students who lack computing experience and therefore may not have internalized a motivation of their own (CS and PDC are cool!  We need to communicate that effectively).  In my opinion, doing the motivation piece well is what will have the largest positive impact on diversity.

In terms of the research funding opportunities, Randy Bryant, a CMU professor and who has for the last year been at the  White House Office of Science and Technology Policy, presented the issues that the recent HPC Executive Order.  Success for the National Strategic Computing Initiative (NSCI) includes “streamlin[ing] HPC application development” and “make[ing] HPC readily usable and accessible”.    Another push is to work toward a convergence in numerically intensive computations (think HPC for large simulations) and data intensive computations (think analyzing connectivity in the internet).  Funding agencies will be considering these goals when strategizing about research programs.

A specific problem: To introduce parallel and distributed computing (PDC) concepts into the first two years of CS, a crucial pre-requisite is to convince instructors teaching CS1, CS2, data structures, and algorithms that such an introduction is feasible.  CS intro level courses are exploding all over the country, and instructors are not going to be keen on being expected to do even more work.  So we need to develop the conceptual tie-in to the existing material, motivating examples and associated modules that provide paths through the suggested PDC concepts, and exercises with programming tools to enable students to experiment with the concepts.

One common theme was the idea of having projects in the second year where students would probably want to use some parallel processing to deal with large datasets otherwise doing the assigned work just took too long.  Violet Syrotiuk at Arizona State University has used an ice sheet thickness dataset in her data structures course and had the students explore various interesting questions.  The students ran into big data problems such as it taking 20+ minutes to read in the dataset from disk.  In general, many attendees suggested that using the analysis of large data sets to motivate the need for parallel and distributed computing was a good idea.  In essence, what are computations that we cannot just do on our phones or laptops.  Some other examples that caught my fancy where distributed cellular automata on cell phones (CELLular automata), physical hashing (put up letters on board and have students go put themselves into buckets based on first letter in name), flash mob detection from tweets, music creation with the whole class contributing, and describing conceptually how Apple’s Siri and Amazon’s Echo work.

An idea that we explored some was in an algorithms course one could focus on parallel sort, parallel matrix operations, and parallel graph algorithms.  The instructor could discuss the complexity of various parallel algorithms, show simple parallelization, and then illustrate that incorrect granularity is an issue.  A senior level PDC course could then be referenced as the place where solving the granularity and load balancing problems will be covered.

The organizers of the workshop, Chip Weems et al., did a fantastic job putting together a diverse group of faculty and scientists.  This workshop was well worth spending my first two days as a University of Arizona faculty member in DC versus Tucson.  Notice in the picture below with Randy Bryant that I am wearing my UA shirt for the first time!

Check out some of these neat resources that can illustrate general CS and/or parallel and distributed concepts or just motivate people to appreciate the power of computational thinking:




No comments:

Post a Comment