I originally posted this at http://dtrace.org/blogs/brendan/2011/02/09/dtrace-pid-provider.
The DTrace "pid" provider allows you to trace the internal execution of processes such as a web browser or a database. It's documented in the original DTrace Guide and in the forthcoming DTrace Book. It is also powerful and unstable, and is the provider you're most likely to shoot yourself in the foot with.
Here I'll introduce the pid provider and discuss the stability of probe names, including tips and gotchas. This should be useful background for anyone using DTrace, which includes running other people's pid provider-based DTrace scripts. For further reading, see my full list of pid provider posts.
Introduction
The term "pid" comes from process-ID, since the provider attaches to one process at a time (which is notable because DTrace usually operates system-wide). Here is a one-liner demo:
# dtrace -n 'pid$target:libc:strlen:entry { trace(timestamp); }' -p 809 dtrace: description 'pid$target:libc:strlen:entry ' matched 1 probe CPU ID FUNCTION:NAME 1 63860 strlen:entry 1372088937798989 1 63860 strlen:entry 1372088937805304 1 63860 strlen:entry 1372088937809190
I'm tracing calls to libc's strlen() by PID 809, which is a bash shell in another window, and printing out a timestamp in nanoseconds. Key points:
The pid provider hooks onto a particular PID.
The process doesn't need restarting for DTrace to work.
This example traced the execution of a compiled C function.
The pid provider can trace when functions begin execution (the "entry" probe) and finish (the "return" probe), as well as individual instructions from the function.
Thousands of Probes
Apart from library functions, the pid provider can also examine functions from the program itself (the a.out segment): for example, listing all function entry probes from this bash shell, by leaving the middle fields of the probe name blank (wildcard):
# dtrace -ln 'pid$target:::entry' -p 809 ID PROVIDER MODULE FUNCTION NAME 62650 pid809 bash __fsr entry 62651 pid809 bash main entry 62652 pid809 bash exit_shell entry 62653 pid809 bash sh_exit entry 62654 pid809 bash shell_is_restricted entry [...6778 lines truncated...]
Software is typically made up of thousands of functions, each of which can be instrumented as a DTrace probe. The above output listed 6783 function probes for this bash shell. And that's just listing them (dtrace -l); actually tracing them and collecting information as they execute would provide an overwhelming amount of data, and would affect the performance of the target software.
The pid provider can make thousands of function probes available.
You generally don't want to trace them all at the same time.
Why
With the pid provider, I can walk up to a production system that's never executed DTrace before and start answering important questions - without downtime. And with thousands of probes, I can answer just about any question about application internals.
The pid provider provides a lot of useful visibility that other tools don't.
In some cases, this isn't a better tool - it's the only tool. I'm reminded of this when I handle live performance issues that must be analyzed immediately, in production, without restarting the application (costly downtime), and without seriously slowing down the application (further than it is already). If the issue isn't anything obvious, such as disk I/O latency (which can be analyzed with the io or syscall providers), and I need to instrument the internals of the running application, then the pid provider is the only tool that can be used in this situation. For the Solaris geeks: I'm not going to use truss -u or apptrace (due to overhead and functionality), and the profile provider is great but only goes so far.
I've been using the pid provider to:
- Retrieve high resolution custom performance metrics from applications.
For example, MySQL query time components in nanoseconds, to quickly locate the source of latency. - Identify whether applications are the victim of known bugs, to see if patching will be worth it.
- Identify whether application configuration tunables need to be adjusted, rather than just trying random adjustments.
- Investigate if applications are the victim of not-yet-known bugs.
These uses can accelerate performance investigations, and reduce server downtime.
While this sounds great, there are some caveats about using the pid provider, regarding stability and performance. I'll explain stability issues here by discussing the pid provider entry probe.
Function Entry Probe
The pid provider "entry" probe, as traced previously, instruments the start of functions. Sometimes just knowing that a function was called and when it was called can be hugely useful. For example, I might like to know: How quickly is my Firefox process performing JavaScript Garbage Collect?
# dtrace -n 'pid$target:libmozjs*:JS_GC:entry { printf("%Y", walltimestamp); }' -p 31999 dtrace: description 'pid$target:libmozjs*:JS_GC:entry ' matched 1 probe CPU ID FUNCTION:NAME 0 108537 JS_GC:entry 2011 Feb 5 19:13:46 1 108537 JS_GC:entry 2011 Feb 5 19:14:32 1 108537 JS_GC:entry 2011 Feb 5 19:14:57 0 108537 JS_GC:entry 2011 Feb 5 19:15:32
This one-liner traces JS_GC(), a function from Mozilla's libmozjs JavaScript engine.
Garbage Collect is not running that often at the moment, seen by the times printed for each invocation of JS_GC(). I was able to gather this information without restarting Firefox or running a special version.
Here's another example:
# dtrace -n 'pid$target:mysqld:log_buffer_flush_to_disk:entry { printf("%Y", walltimestamp); }' -p 23423 dtrace: description 'pid$target:mysqld:log_buffer_flush_to_disk:entry ' matched 1 probe CPU ID FUNCTION:NAME 10 83454 log_buffer_flush_to_disk:entry 2011 Feb 8 20:00:36 1 83454 log_buffer_flush_to_disk:entry 2011 Feb 8 20:00:37 5 83454 log_buffer_flush_to_disk:entry 2011 Feb 8 20:00:38 5 83454 log_buffer_flush_to_disk:entry 2011 Feb 8 20:00:39
This traces log_buffer_flush_to_disk(), a function from the heart of innodb in the MySQL database. And this was without restarting mysqld.
Probe Name Stability
I picked three different types of functions to demonstrate:
- pid$target:libc:strlen:entry - libc's strlen()
- pid$target:limbozjs*:JS_GC:entry - Firefox's libmozjs JS_GC()
- pid$target:mysqld:log_buffer_flush_to_disk:entry - MySQL innodb's log_buffer_flush_to_disk()
Which, if any, of these probe names is likely to still work a year from now? Five years?
This is a question of "stability". Probe names are an interface that we use by including them in D scripts, or DTrace one-liners. It's important to know whether we can depend on them continuing to work.
Since the pid provider is exposing functions from software, which may be changed by the programmer at any time (software updates, patches, new releases), this is an "unstable" interface. There is no guarantee that these probes will work on any software version other than the one they were written for.
pid provider probes are "unstable", and may only work on one software version.
Reality Check
While technically the probes above are unstable and can't be relied upon, here are my thoughts on the relative stability of each of these examples:
pid$target:libc:strlen:entry
This is from the industry-wide POSIX standard, which programmers depend upon and expect to be stable. It is well documented, including man pages:
$ man strlen Reformatting page. Please Wait... done Standard C Library Functions string(3C) [...] size_t strlen(const char *s); [...]
The likelihood of this changing is near zero. Scripts that trace such standard library functions are likely to keep working for a long time.
POSIX functions can provide reliable pid provider probes.
pid$target:libmozjs*:JS_GC:entry
This is from the Spider Monkey source, js/src/jsapi.c:
2455 JS_PUBLIC_API(void) 2456 JS_GC(JSContext *cx) 2457 { 2458 /* Don't nuke active arenas if executing or compiling. */ 2459 if (cx->stackPool.current == &cx->stackPool.first) 2460 JS_FinishArenaPool(&cx->stackPool); 2461 if (cx->tempPool.current == &cx->tempPool.first) 2462 JS_FinishArenaPool(&cx->tempPool); 2463 js_GC(cx, GC_NORMAL); 2464 }
So, while this may not be a POSIX standard function, line 2455 suggests that this is a public API (Application Programmer Interface). It is, and is documented on the Mozilla developer website. API functions, especially public API functions, make good targets for DTrace. Because they are deliberately choosen to be consumed by other software, they are likely to keep working. Mozilla could roll out a major JavaScript update and change their API, but this is unlikely to happen often. (It would break more than just our DTrace scripts, too.) In summary, the likelihood of this function changing is small.
Public API functions make for decent pid probes too.
What's probably more likely to change is the module location, "libmozjs", which I've included in the probe name. If that does change, it's easily fixed (removing it may be safer: pid$target::JS_GC:entry will wildcard match all modules, so it would not matter what the module name was).
pid$target:mysqld:log_buffer_flush_to_disk:entry
This is from innodb source code, innobase/log/log0log.c:
/******************************************************************** Does a syncronous flush of the log buffer to disk. */ void log_buffer_flush_to_disk(void) /*==========================*/ { dulint lsn; [...]
It's not clear that this is part of a public API, so this could be subject to change at any point. It is, however, an exported function:
mysql-5.0.15/innobase$ grep log_buffer_flush_to_disk */* include/log0log.h:log_buffer_flush_to_disk(void); log/log0log.c: log_buffer_flush_to_disk(); log/log0log.c:log_buffer_flush_to_disk(void) log/log0log.c: log_buffer_flush_to_disk(); row/row0mysql.c: log_buffer_flush_to_disk(); srv/srv0srv.c: log_buffer_flush_to_disk(); srv/srv0srv.c: log_buffer_flush_to_disk(); [...]
It's declared in log0log.h and used by other bodies of code. My guess (I'm not an innodb software engineer) is that this is a private API, since it's listed and documented in log0log.h, and it may not change that often (programmers would have to change all locations). It certainly could change, though. I'd say the likelihood of this type of probe changing is fair. Try to avoid.
If this was a static function and was only called locally in that file, then the likelihood of it changing is high. Really try to avoid. (The compiler may not even keep the symbol, such as by inlining the function, which would make the function entry invisible to DTrace anyway.)
Try to avoid tracing random local functions.
USDT
The potential brittleness of the pid provider is not a flaw of DTrace, but a reflection of the underlying code. DTrace has a feature to address the situation: User Statically Defined Tracing (USDT).
This allows developers to define stable probe points in their code. By doing so, you can create new providers which are explicitly defined to be stable, and export stable probe names. Such providers include the ruby provider, the javascript provider and the mysql provider. If a USDT provider exists for your target software - use it. Only reach for the pid provider if a USDT provider doesn't exist, or can't be introduced.
Finding Safer Probes
If you are stuck with the pid provider (no USDT), here are a few ways you may be able to find more stable probes:
Browsing the software vendor website
Type "API" into their search bar. Also try "dtrace" - if their own developers have shared some pid provider scripts, then there is a fair chance they have picked safer probes to trace, based on their own familiarity with the source.
dtrace -l | grep
I'll sometimes list available probes and look for short generic function names, and then check the source code.
Stack Fishing
APIs are often used in this way:
Application | V API | V Sub-system
So if I know a sub-system function, I can DTrace that and look at the stack back trace to find the API functions, if an API is used. Some subsystems will call system calls at some point, which are a great place to cast our rod:
# dtrace -x ustackframes=100 -n 'syscall:::entry /pid == 31999/ { @[ustack()] = count(); }' dtrace: description 'syscall:::entry ' matched 434 probes ^C [...] libSystem.B.dylib`madvise+0xa libSystem.B.dylib`free+0xf4 XUL`NS_CStringContainerInit_P+0x1c9c6 XUL`NS_CStringContainerInit_P+0x197ba XUL`NS_CStringContainerInit_P+0x19863 XUL`DumpJSStack+0x1afb6 XUL`void std::__adjust_heap<__gnu_cxx::__normal_iterator<nsRefPtr libmozjs.dylib`js_TraceContext+0x96f libmozjs.dylib`JS_GC+0x78 <-- here I am XUL`DumpJSObject+0x5c7 XUL`NS_CycleCollectorForget_P+0xfd6 XUL`NS_CycleCollectorForget_P+0x11d9 XUL`void std::__adjust_heap<__gnu_cxx::__normal_iterator<nsRefPtr XUL`void std::__adjust_heap<__gnu_cxx::__normal_iterator<nsRefPtr XUL`NS_GetComponentRegistrar_P+0xc58d XUL`NS_GetComponentRegistrar_P+0xc6a8 XUL`NS_GetComponentRegistrar_P+0x8db7 XUL`JNIEnv_::CallStaticObjectMethod(_jclass*, _jmethodID*, ...)+0x XUL`JSD_GetValueForObject+0x10fb32 XUL`JSD_GetValueForObject+0xd844e CoreFoundation`__CFRunLoopDoSources0+0x61b CoreFoundation`__CFRunLoopRun+0x42f CoreFoundation`CFRunLoopRunSpecific+0x1c4 CoreFoundation`CFRunLoopRunInMode+0x61 HIToolbox`RunCurrentEventLoopInMode+0x188 HIToolbox`ReceiveNextEventCommon+0x9e HIToolbox`BlockUntilNextEventMatchingListInMode+0x51 AppKit`_DPSNextEvent+0x34f AppKit`-[NSApplication nextEventMatchingMask:untilDate:inMode:dequ AppKit`-[NSApplication run]+0x335 XUL`JSD_GetValueForObject+0xd7fea XUL`void std::__adjust_heap<__gnu_cxx::__normal_iterator<nsRefPtr XUL`XRE_main+0x3d6b firefox-bin`start+0x878 firefox-bin`start+0x102 firefox-bin`start+0x29 firefox-bin`0x2 71
I've annotated where XUL (Firefox core) calls into libmozjs, via JS_GC(). What I'm looking for is a transition from one body of code into another, here the transition into libmozjs, to see how other code is calling into the JavaScript engine.
When Things Break
While you can try to pick safer probes, there is always the possibility that something will break across software versions. How this happens to function names includes the following:
- The function is renamed.
- The function is deleted, and the code moved elsewhere.
- A new function is added to perform this task, and this function is left behind in the source as legacy (or as a slow path).
- The function is rewritten to perform a different action.
If it's (1) or (2), you are lucky, and something like this will happen (this is a pid provider-based script):
# ./mysql_pid_latency.d -p 23423 dtrace: failed to compile script ./mysql_pid_latency.d: line 18: probe description pid23423::*parse_command*:entry does not match any probes
Then you can immediately start work fixing the script: which can include examining the soure code (if available) for the software version the script is written for and the new version, to see what became of that function.
If it's (3) or (4), your script will execute, but the results will not be what you expect. You could describe this as a silent failure. If you are lucky here, you'll have a good estimate of what the results should be, and what you get is so far off that you'll begin to investigate why. If you are unlucky, the output will seem plausible (but is wrong) and you'll continue your investigation elsewhere based on bad data. Maybe a week later you'll realize this, and may have to repeat a week's work.
On a different software version, the same pid probes may appear to work, but the results are wrong.
I deal with (3) and (4) in a few ways: I'll test with a known workload (if possible), especially one where I can create a prime number of events. I also sanity test with other tools, and quickly revisit source code to see if things have changed.
Preempting Breakage
I'm getting into the habit of documenting the software version in my pid provider DTrace scripts, with a block comment at the top, to help the future user debug a broken script. eg:
#!/usr/sbin/dtrace -s /* * mysql_pid_latency.d Print query latency distribution every second. * * USAGE: ./mysql_pid_latency.d -p mysqld_PID * * TESTED: these pid-provider probes may only work on some mysqld versions. * 5.0.51a: ok */ [...]
For each version I test this on, I'll add another line under TESTED to report "ok" or "fail".
Another suggestion is to keep it simple - try not to write DTrace scripts that are hundreds of lines long, and instrument dozens of functions.
pid provider scripts should be short and to the point.
I learnt this the hard way with tcpsnoop.d, a script based on the fbt provider, which is the kernel version of the pid provider. tcpsnoop.d is hundreds of lines long, instruments many kernel locations, and has broken several times since I first wrote it. These days I'm writing very short and to-the-point D scripts, which are safer (fewer probes means more stability) and easier to maintain. You'll find many of my examples in the DTrace book, where short scripts were also preferable as concise textbook examples.
Function Arguments
You can get a lot of mileage from just knowing that functions were called, via tracing the "entry" probe. But there is a lot more to the pid provider, including the "return" probe for function completion, instruction probes and arguments.
Arguments to the entry probe are the arguments to the function. My original strlen() example takes a string (char *) as an argument, which we can inspect using DTrace. I'll write a follow up post to discuss this.
What I'd like to note here is that function arguments, like the function names, are not technically a stable interface - and are subject to change when the software changes. Same comments apply: I don't think the arguments to strlen() will change anytime soon (if ever) however local functions could change at any point. If you are examining and printing arguments, this means that with software updates your D scripts could start printing out junk data.
Summary
While the pid provider provides great visibility into application internals, there are some limitations that you should be aware of when using it. Since it traces function names that can change between software versions, DTrace scripts or one-liners that use the pid provider may only work on the software version they were written for. On other software versions they may fail in an obvious way ("dtrace: failed to compile script"), or in a subtle way - leaving you with data that is plausible but wrong. To mitigate this, try to trace functions that are unlikely to change often (eg, public API functions), document the software version traced when sharing pid provider based scripts, and try to double check the output of DTrace scripts with other tools (sanity test).
And if you have found one of my pid provider based scripts doesn't work on your software version - my apologies in advance. This is an unavoidable risk when using the pid provider. I'm minimizing it by tracing the most stable looking functions, and documenting the software versions that the scripts were written for. Hopefully, more USDT providers will be written over time, so that reaching for the pid provider is only necessary when you don't have the stable USDT coverage.
Update: for more details, see the full list of pid provider posts.