[an error occurred while processing this directive]
Brown CS // CS190 Home // Assignments // gprof

Profiling Lab

Background

The purpose of this lab is to give you some experience using profiling tools to track down performance bottlenecks in your applications. Boa definitely has some bottlenecks, as demos have shown. Unfortunately, the default Python profiling technique, the profile module, is of limited use inside TurboGears. There's a nice option to set up to see the output, which is worth taking a look at, though. Adding the following line to start-boa.py will enable profiling:

cherrypy.config.update({'profiling.on': True, 'profiling.path':"/somewhere/profiles/"})

Run Boa for a bit, then kick off the following command:

/usr/share/python-support/python-cherrypy/cherrypy/lib/profiler.py /somewhere/profiles/ 8000

And visit http://localhost:8000 in a browser. You'll see a bunch of statistics for each request the server got (since it last restarted). You'll probably notice relatively little correlation between what the profiler gives you and actual code you wrote; the web framework obscures much of it. You can also use the pstats module described in the profile module's documentation to aggregate these files and do more directed analysis on them. For now we'll skip over that.

For today's lab, you have two options since the normal profiler doesn't work so well. You can either work in small groups to play with building in your own application-specific profiling support into Boa, or you can use the standard tools individually on a toy example. The whole class needn't do the same thing. Some people can work individually with the toy example while others work in one or more small groups to try to instrument different pieces of Boa.

Instrumenting Boa

Get into small groups (no more than 4), and take a look at the wrapper technique described here. Work together to try to instrument some part of Boa. If you're successful, see if you can get a performance increase there.

Basic Example

Independently, you may grab a small example project here. This is code for a small grep-like utility. Run run.py with the name of some text file, and you'll be prompted repeatedly for words you'd like to know about in the text (case insensitive). You'll be told for each word what lines of the file it appears on, similar to grep. However, you'll notice that it runs much slower than grep. Because the performance bugs are fairly obvious, for the sake of getting more out of the assignment, you should not look at any code until the profiling data tells you that's the problem.

'./run.py [filename]' will kick off the program. When you exit the program, it will print out some profiling data for you. If you look in run.py, you'll see that it's set up to use the default profiling module. That module's documentation has information on how to dump the statistics to a file and do more interesting analyses on them if you're interested, but it shouldn't be necessary for this. Run it on a big file. Someone put the entire text of Hamlet into Dan Kuebrich's .plan last year, and he hasn't changed it yet. So you can use /u/dkuebric/.plan as your test file. You'll notice that it takes (initially) about 40 seconds to look up a single word in that file. With a couple changes directed solely by profiling data, without looking at any code in advance, it's possible to get that down to well under a tenth of a second. Have fun!

[an error occurred while processing this directive]