
Introduction
Even seemingly simple multithreaded programs may have performance
problems whose causes are far from obvious. A program employing a
number of independent threads may show no speedup when moved from a
uniprocessor to a multiprocessor. One's carefully designed
synchronization strategy may inexplicably result in miserable
performance. Programs that run quite well most of the time may behave
poorly at key moments.
Determining the causes of such problems using traditional tools can be
challenging. Most tools (such as gprof and thread analyzer) depend on
the availability of source code. However, while programmer-supplied
code might appear faultless, library code, for which no source code is
available, might make one's threads perform unanticipated (and
expensive) actions. User and kernel schedulers might make conflicting
decisions about which threads should be running.
Traditional tools present overall indications of performance, such as
the total number of calls to each procedure, the time spent in each
function, who called whom how often, etc. This information is not
correlated with the passage of time. For time-critical programs,
however, it is crucial to relate performance data to the events to
which the program is responding: it is not enough to know
where a program has spent its time, we must know
when the program spent that time. In addition, to take full
advantage of a multiprocessor, we need to know when the
program is spending its time on which processors. Rather than
present such information after the program has executed, we present it
while the program is running, thus making it easy to relate our
interactions with the subject program to the performance data being
displayed.
This website describes ThreadMon, a monitoring tool for improving the
performance of multithreaded programs. We describe the tool and how
to use it, as well as give some background information which will
hopefully help the user understand what is being displayed. Some
areas in which ThreadMon has proven itself useful are:
- Bottleneck analysis: concurrent programs consist of
a number of threads, each executing instructions independently and
competing for various resources. Contention for these resources
hinders performance--thus its minimization is an important goal. By
interposing itself between the application and the threads package,
ThreadMon can monitor a program's resource usage and display the
extent of contention, not only for individual resources but for
aggregates of resources. Compounding this resource-contention problem
is that many library routines cause contention for resources that the
application programmer may not even know exist. Our tool identifies
and shows the conflicts for these resources, providing further
valuable information to the programmer.
- Processor-utilization analysis: an important concern to
the user of a multiprocessor workstation is whether all processors are
being effectively utilized. If so, could adding more processors yield
performance gains? By showing what the program's threads and the
workstation's processors are doing, ThreadMon gives the programmer
sufficient information to handle these concerns: it does not solve
performance problems, but points out that there are problems and
provides feedback on the effectiveness of the programmer's
solutions.
- Studying the effectiveness of two-level threads-implementation
strategies: Most thread packages provide simple, easy-to-use
threads abstractions to the application programmer. Hidden behind many
of these packages, however, is a two-level implementation model (also
known as the many-to-many model) in which the user-level
library schedules user threads on kernel threads and
the kernel schedules kernel threads on
processors. Potential programmer concerns when using this
model include insuring adequate concurrency (e.g., making certain that
threads can execute when they are ready and processors are available)
and minimizing overhead in managing user and kernel threads. Without
knowledge of both the implementation model and its runtime behavior
with respect to one's application, programmers can unknowingly
encounter performance problems. ThreadMon is being used to help the
programmer discover these problems and develop tactics for overcoming
them. We discuss the two-level implementation model and compare it
with other implementation models, such as Scheduler Activations.
[
Top |
Introduction |
Solaris |
Threadmon |
References
]
The text of this web document was taken from a paper by Bryan M. Cantrill and Thomas W. Doeppner Jr., Department
of Computer Science, Brown University, Providence, RI 02912-1910
Greg Foxman (gmf@cs.brown.edu)