"Theory and Applications of Parallelism with Futures"
Wednesday, May 10, 2017, at 2:00 P.M.
Room 368 (CIT 3rd Floor)
Futures are an attractive way to structure parallel computations. When a thread creates an expression with a keyword future, a new thread is spawned to compute that expression in parallel with the thread that created it. When a thread later applies a touch operation to that future, it gets the result of the expression if the result has been computed, and otherwise blocks until the result becomes ready. In this thesis, we explore different aspects of parallel programs with futures, including their cache locality, time bounds, scheduling, and applications.
Researchers have shown that while futures can, of course, enhance parallelism in a structured way, they can have a deleterious effect on cache locality. We will show, however, that if futures are used in a simple, disciplined way, then their negative impact on cache locality can be much alleviated, a significant improvement over the previous result. This structured use of futures is characteristic of many (but not all) parallel applications.
Futures lend themselves well to sophisticated dynamic scheduling algorithms, such as work stealing and its variants, that ensure high processor utilization. Implementing work stealing algorithms for future-parallel programs on hierarchical platforms, such as NUMA systems and distributed clusters, have recently drawn lots of attention. However, little has been explored on their theoretical bounds on those systems. We presents both lower and upper bounds of work stealing for fork-join programs, a well-studied subclass of parallel-future programs, on hierarchical systems. Our main result is that the expected execution time of a fork-join program with a work stealing algorithm is optimal within a factor of $\log nr$, where $n$ is the number of processors in a cluster (or a socket in a NUMA system) and $r$ is the latency for a processor to access a remote cluster.
As originally conceived, a future encapsulates a short-lived functional computation, so the order in which futures are executed cannot be observed. Recently, however, futures have been proposed as a way to encapsulate method calls to long-lived shared data structures. We proposes a new program model, called the linearizable-futures model, that supports both normal futures without side-effects and linearizable futures that that exist for their side-effects. Using this model, we propose the lazy work stealing scheduler to facilitate combining and elimination optimizations for linearizable futures. We prove good time bounds on program executions using lazy work stealing.
The processing-in-memory (PIM) model has been studied for decades and has reemerged recently as a solution to alleviating the growing speed discrepancy between CPU's computation and memory access, commonly known as the memory wall. In the PIM model, some lightweight computing units are directly attached to the main memory, providing much faster memory access than CPUs. We study applications of linearizable futures in the PIM model: operation requests to concurrent data structures are sent as linearizable futures to those computing units to execute. We show that our data structures can outperform state-of-the-art concurrent data structures in the literature.
Host: Professor Maurice Herlihy