Introduction

Even seemingly simple multithreaded programs may have performance problems whose causes are far from obvious. A program employing a number of independent threads may show no speedup when moved from a uniprocessor to a multiprocessor. One's carefully designed synchronization strategy may inexplicably result in miserable performance. Programs that run quite well most of the time may behave poorly at key moments.

Determining the causes of such problems using traditional tools can be challenging. Most tools (such as gprof and thread analyzer) depend on the availability of source code. However, while programmer-supplied code might appear faultless, library code, for which no source code is available, might make one's threads perform unanticipated (and expensive) actions. User and kernel schedulers might make conflicting decisions about which threads should be running.

Traditional tools present overall indications of performance, such as the total number of calls to each procedure, the time spent in each function, who called whom how often, etc. This information is not correlated with the passage of time. For time-critical programs, however, it is crucial to relate performance data to the events to which the program is responding: it is not enough to know where a program has spent its time, we must know when the program spent that time. In addition, to take full advantage of a multiprocessor, we need to know when the program is spending its time on which processors. Rather than present such information after the program has executed, we present it while the program is running, thus making it easy to relate our interactions with the subject program to the performance data being displayed.

This website describes ThreadMon, a monitoring tool for improving the performance of multithreaded programs. We describe the tool and how to use it, as well as give some background information which will hopefully help the user understand what is being displayed. Some areas in which ThreadMon has proven itself useful are:

Bottleneck analysis: concurrent programs consist of a number of threads, each executing instructions independently and competing for various resources. Contention for these resources hinders performance--thus its minimization is an important goal. By interposing itself between the application and the threads package, ThreadMon can monitor a program's resource usage and display the extent of contention, not only for individual resources but for aggregates of resources. Compounding this resource-contention problem is that many library routines cause contention for resources that the application programmer may not even know exist. Our tool identifies and shows the conflicts for these resources, providing further valuable information to the programmer.
Processor-utilization analysis: an important concern to the user of a multiprocessor workstation is whether all processors are being effectively utilized. If so, could adding more processors yield performance gains? By showing what the program's threads and the workstation's processors are doing, ThreadMon gives the programmer sufficient information to handle these concerns: it does not solve performance problems, but points out that there are problems and provides feedback on the effectiveness of the programmer's solutions.
Studying the effectiveness of two-level threads-implementation strategies: Most thread packages provide simple, easy-to-use threads abstractions to the application programmer. Hidden behind many of these packages, however, is a two-level implementation model (also known as the many-to-many model) in which the user-level library schedules user threads on kernel threads and the kernel schedules kernel threads on processors. Potential programmer concerns when using this model include insuring adequate concurrency (e.g., making certain that threads can execute when they are ready and processors are available) and minimizing overhead in managing user and kernel threads. Without knowledge of both the implementation model and its runtime behavior with respect to one's application, programmers can unknowingly encounter performance problems. ThreadMon is being used to help the programmer discover these problems and develop tactics for overcoming them. We discuss the two-level implementation model and compare it with other implementation models, such as Scheduler Activations.

[ Top | Introduction | Solaris | Threadmon | References ]

The text of this web document was taken from a paper by Bryan M. Cantrill and Thomas W. Doeppner Jr., Department of Computer Science, Brown University, Providence, RI 02912-1910

Greg Foxman (gmf@cs.brown.edu)