Logger is a lightweight profling/instrumentation library for user-mode programs. It is designed for profiling C/C++ programs with minimum impact on the performance of the program being studied.
I am actively using logger to collect data for building model of multithreaded programs.

Logger allows to insert probes into the source code of the program. Each probe is a call to the Logger function. It records an ID and a timestamp when it was called, probe ID, IDs of the current process and thread.

Probes divide the program into "blocks"; for each block we can record a time necessary for its execution. When the probe is called, it adds a corresponding record into the array of probe hits. Each thread allocates its own array for storing probe hits in the TLS (thread local storage). When a sufficient number of records accumulates in the thread array, contents of the thread array are moved into a single global array maintained by the Logger. Again, when the number of records in the global array reaches a certain limit, contents of the global array, logger saves them to the file on the disk.

Writing to the file is performed by a working thread to minimize the impact on the performance of the program being studied. Data is stored in a .csv format. Each line corresponds to a single probe hit and has following records:
- PID;
- TID;
- probe ID;
- CPU timestamp, seconds
- CPU timestamp, nanoseconds
- wallclock timestamp, seconds
- wallclock timestamp, nanoseconds

To use the Logger library just link it in your program. The function alexta::Initialize() initializes Logger; it must be called at the beginning of your program. Probes are calls to alexta::LogEvent(probeID) functions, where probeID is a unique ID for the probe. To insert the probe into your code, just add call to this function into your code. You can download Logger from here.

You can configure the Logger by editing #define compiler directives in the Logger.h file. Configuration parameters include the name of the log file where information will be stored; sizes of the global array and per-thread arrays. Also, if you need to record only CPU time or only wallclock time you can disable LOG_WC_TIME or LOG_CPU_TIME directives correspondingly. This will make the library a little bit faster. For your convenience, I am including the instrumented copy of the tinyHttpd web server.

Logger is particularly good for profiling long-running programs, such as servers, with minimal overhead.

Logger pros:
- Our benchmarks shows that on average each probe takes about 2 microseconds of wallclock time, which
is pretty lightweight.

Logger cons:
- Log records are stored not in the order. Not a big deal - you can always sort the output file by the
timestamp. But it is still an inconvenience though;
- When the program exits some portion of records are lost. These are records that are stored in per-thread arrays and are not transferred to the global array yet.