DTrace pid Provider

I originally posted this at http://dtrace.org/blogs/brendan/2011/02/09/dtrace-pid-provider.

The DTrace "pid" provider allows you to trace the internal execution of processes such as a web browser or a database. It's documented in the original DTrace Guide and in the forthcoming DTrace Book. It is also powerful and unstable, and is the provider you're most likely to shoot yourself in the foot with.

Here I'll introduce the pid provider and discuss the stability of probe names, including tips and gotchas. This should be useful background for anyone using DTrace, which includes running other people's pid provider-based DTrace scripts. For further reading, see my full list of pid provider posts.

Introduction

The term "pid" comes from process-ID, since the provider attaches to one process at a time (which is notable because DTrace usually operates system-wide). Here is a one-liner demo:

# dtrace -n 'pid$target:libc:strlen:entry { trace(timestamp); }' -p 809
dtrace: description 'pid$target:libc:strlen:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  1  63860                     strlen:entry  1372088937798989
  1  63860                     strlen:entry  1372088937805304
  1  63860                     strlen:entry  1372088937809190

I'm tracing calls to libc's strlen() by PID 809, which is a bash shell in another window, and printing out a timestamp in nanoseconds. Key points:

The pid provider hooks onto a particular PID.
The process doesn't need restarting for DTrace to work.
This example traced the execution of a compiled C function.

The pid provider can trace when functions begin execution (the "entry" probe) and finish (the "return" probe), as well as individual instructions from the function.

Thousands of Probes

Apart from library functions, the pid provider can also examine functions from the program itself (the a.out segment): for example, listing all function entry probes from this bash shell, by leaving the middle fields of the probe name blank (wildcard):

# dtrace -ln 'pid$target:::entry' -p 809
   ID   PROVIDER            MODULE                          FUNCTION NAME
62650     pid809              bash                             __fsr entry
62651     pid809              bash                              main entry
62652     pid809              bash                        exit_shell entry
62653     pid809              bash                           sh_exit entry
62654     pid809              bash               shell_is_restricted entry
[...6778 lines truncated...]

Software is typically made up of thousands of functions, each of which can be instrumented as a DTrace probe. The above output listed 6783 function probes for this bash shell. And that's just listing them (dtrace -l); actually tracing them and collecting information as they execute would provide an overwhelming amount of data, and would affect the performance of the target software.

The pid provider can make thousands of function probes available.
You generally don't want to trace them all at the same time.

Why

With the pid provider, I can walk up to a production system that's never executed DTrace before and start answering important questions - without downtime. And with thousands of probes, I can answer just about any question about application internals.

The pid provider provides a lot of useful visibility that other tools don't.

In some cases, this isn't a better tool - it's the only tool. I'm reminded of this when I handle live performance issues that must be analyzed immediately, in production, without restarting the application (costly downtime), and without seriously slowing down the application (further than it is already). If the issue isn't anything obvious, such as disk I/O latency (which can be analyzed with the io or syscall providers), and I need to instrument the internals of the running application, then the pid provider is the only tool that can be used in this situation. For the Solaris geeks: I'm not going to use truss -u or apptrace (due to overhead and functionality), and the profile provider is great but only goes so far.

I've been using the pid provider to:

Retrieve high resolution custom performance metrics from applications.
For example, MySQL query time components in nanoseconds, to quickly locate the source of latency.
Identify whether applications are the victim of known bugs, to see if patching will be worth it.
Identify whether application configuration tunables need to be adjusted, rather than just trying random adjustments.
Investigate if applications are the victim of not-yet-known bugs.

These uses can accelerate performance investigations, and reduce server downtime.

While this sounds great, there are some caveats about using the pid provider, regarding stability and performance. I'll explain stability issues here by discussing the pid provider entry probe.

Function Entry Probe

The pid provider "entry" probe, as traced previously, instruments the start of functions. Sometimes just knowing that a function was called and when it was called can be hugely useful. For example, I might like to know: How quickly is my Firefox process performing JavaScript Garbage Collect?

# dtrace -n 'pid$target:libmozjs*:JS_GC:entry { printf("%Y", walltimestamp); }' -p 31999
dtrace: description 'pid$target:libmozjs*:JS_GC:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0 108537                      JS_GC:entry 2011 Feb  5 19:13:46
  1 108537                      JS_GC:entry 2011 Feb  5 19:14:32
  1 108537                      JS_GC:entry 2011 Feb  5 19:14:57
  0 108537                      JS_GC:entry 2011 Feb  5 19:15:32

This one-liner traces JS_GC(), a function from Mozilla's libmozjs JavaScript engine.

Garbage Collect is not running that often at the moment, seen by the times printed for each invocation of JS_GC(). I was able to gather this information without restarting Firefox or running a special version.

Here's another example:

# dtrace -n 'pid$target:mysqld:log_buffer_flush_to_disk:entry { printf("%Y", walltimestamp); }'
-p 23423
dtrace: description 'pid$target:mysqld:log_buffer_flush_to_disk:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
 10  83454   log_buffer_flush_to_disk:entry 2011 Feb  8 20:00:36
  1  83454   log_buffer_flush_to_disk:entry 2011 Feb  8 20:00:37
  5  83454   log_buffer_flush_to_disk:entry 2011 Feb  8 20:00:38
  5  83454   log_buffer_flush_to_disk:entry 2011 Feb  8 20:00:39

This traces log_buffer_flush_to_disk(), a function from the heart of innodb in the MySQL database. And this was without restarting mysqld.

Probe Name Stability

I picked three different types of functions to demonstrate:

pid$target:libc:strlen:entry - libc's strlen()
pid$target:limbozjs*:JS_GC:entry - Firefox's libmozjs JS_GC()
pid$target:mysqld:log_buffer_flush_to_disk:entry - MySQL innodb's log_buffer_flush_to_disk()

Which, if any, of these probe names is likely to still work a year from now? Five years?

This is a question of "stability". Probe names are an interface that we use by including them in D scripts, or DTrace one-liners. It's important to know whether we can depend on them continuing to work.

Since the pid provider is exposing functions from software, which may be changed by the programmer at any time (software updates, patches, new releases), this is an "unstable" interface. There is no guarantee that these probes will work on any software version other than the one they were written for.

pid provider probes are "unstable", and may only work on one software version.

Reality Check

While technically the probes above are unstable and can't be relied upon, here are my thoughts on the relative stability of each of these examples:

pid$target:libc:strlen:entry

This is from the industry-wide POSIX standard, which programmers depend upon and expect to be stable. It is well documented, including man pages:

$ man strlen
Reformatting page.  Please Wait... done

Standard C Library Functions                           string(3C)
[...]
     size_t strlen(const char *s);
[...]

The likelihood of this changing is near zero. Scripts that trace such standard library functions are likely to keep working for a long time.

POSIX functions can provide reliable pid provider probes.

pid$target:libmozjs*:JS_GC:entry

This is from the Spider Monkey source, js/src/jsapi.c:

2455 JS_PUBLIC_API(void)
2456 JS_GC(JSContext *cx)
2457 {
2458     /* Don't nuke active arenas if executing or compiling. */
2459     if (cx->stackPool.current == &cx->stackPool.first)
2460         JS_FinishArenaPool(&cx->stackPool);
2461     if (cx->tempPool.current == &cx->tempPool.first)
2462         JS_FinishArenaPool(&cx->tempPool);
2463     js_GC(cx, GC_NORMAL);
2464 }

So, while this may not be a POSIX standard function, line 2455 suggests that this is a public API (Application Programmer Interface). It is, and is documented on the Mozilla developer website. API functions, especially public API functions, make good targets for DTrace. Because they are deliberately choosen to be consumed by other software, they are likely to keep working. Mozilla could roll out a major JavaScript update and change their API, but this is unlikely to happen often. (It would break more than just our DTrace scripts, too.) In summary, the likelihood of this function changing is small.

Public API functions make for decent pid probes too.

What's probably more likely to change is the module location, "libmozjs", which I've included in the probe name. If that does change, it's easily fixed (removing it may be safer: pid$target::JS_GC:entry will wildcard match all modules, so it would not matter what the module name was).

pid$target:mysqld:log_buffer_flush_to_disk:entry

This is from innodb source code, innobase/log/log0log.c:

/********************************************************************
Does a syncronous flush of the log buffer to disk. */

void
log_buffer_flush_to_disk(void)
/*==========================*/
{
        dulint  lsn;
[...]

It's not clear that this is part of a public API, so this could be subject to change at any point. It is, however, an exported function:

mysql-5.0.15/innobase$ grep log_buffer_flush_to_disk */*
include/log0log.h:log_buffer_flush_to_disk(void);
log/log0log.c:      log_buffer_flush_to_disk();
log/log0log.c:log_buffer_flush_to_disk(void)
log/log0log.c:               log_buffer_flush_to_disk();
row/row0mysql.c:    log_buffer_flush_to_disk();
srv/srv0srv.c:      log_buffer_flush_to_disk();
srv/srv0srv.c:          log_buffer_flush_to_disk();
[...]

It's declared in log0log.h and used by other bodies of code. My guess (I'm not an innodb software engineer) is that this is a private API, since it's listed and documented in log0log.h, and it may not change that often (programmers would have to change all locations). It certainly could change, though. I'd say the likelihood of this type of probe changing is fair. Try to avoid.

If this was a static function and was only called locally in that file, then the likelihood of it changing is high. Really try to avoid. (The compiler may not even keep the symbol, such as by inlining the function, which would make the function entry invisible to DTrace anyway.)

Try to avoid tracing random local functions.

USDT

The potential brittleness of the pid provider is not a flaw of DTrace, but a reflection of the underlying code. DTrace has a feature to address the situation: User Statically Defined Tracing (USDT).

This allows developers to define stable probe points in their code. By doing so, you can create new providers which are explicitly defined to be stable, and export stable probe names. Such providers include the ruby provider, the javascript provider and the mysql provider. If a USDT provider exists for your target software - use it. Only reach for the pid provider if a USDT provider doesn't exist, or can't be introduced.

Finding Safer Probes

If you are stuck with the pid provider (no USDT), here are a few ways you may be able to find more stable probes:

Browsing the software vendor website

Type "API" into their search bar. Also try "dtrace" - if their own developers have shared some pid provider scripts, then there is a fair chance they have picked safer probes to trace, based on their own familiarity with the source.

dtrace -l | grep

I'll sometimes list available probes and look for short generic function names, and then check the source code.

Stack Fishing

APIs are often used in this way:

    Application
        |
        V
       API
        |
        V
    Sub-system

So if I know a sub-system function, I can DTrace that and look at the stack back trace to find the API functions, if an API is used. Some subsystems will call system calls at some point, which are a great place to cast our rod:

# dtrace -x ustackframes=100 -n 'syscall:::entry /pid == 31999/ { @[ustack()] = count(); }'
dtrace: description 'syscall:::entry ' matched 434 probes
^C
[...]
              libSystem.B.dylib`madvise+0xa
              libSystem.B.dylib`free+0xf4
              XUL`NS_CStringContainerInit_P+0x1c9c6
              XUL`NS_CStringContainerInit_P+0x197ba
              XUL`NS_CStringContainerInit_P+0x19863
              XUL`DumpJSStack+0x1afb6
              XUL`void std::__adjust_heap<__gnu_cxx::__normal_iterator<nsRefPtr
              libmozjs.dylib`js_TraceContext+0x96f
              libmozjs.dylib`JS_GC+0x78         <-- here I am
              XUL`DumpJSObject+0x5c7
              XUL`NS_CycleCollectorForget_P+0xfd6
              XUL`NS_CycleCollectorForget_P+0x11d9
              XUL`void std::__adjust_heap<__gnu_cxx::__normal_iterator<nsRefPtr
              XUL`void std::__adjust_heap<__gnu_cxx::__normal_iterator<nsRefPtr
              XUL`NS_GetComponentRegistrar_P+0xc58d
              XUL`NS_GetComponentRegistrar_P+0xc6a8
              XUL`NS_GetComponentRegistrar_P+0x8db7
              XUL`JNIEnv_::CallStaticObjectMethod(_jclass*, _jmethodID*, ...)+0x
              XUL`JSD_GetValueForObject+0x10fb32
              XUL`JSD_GetValueForObject+0xd844e
              CoreFoundation`__CFRunLoopDoSources0+0x61b
              CoreFoundation`__CFRunLoopRun+0x42f
              CoreFoundation`CFRunLoopRunSpecific+0x1c4
              CoreFoundation`CFRunLoopRunInMode+0x61
              HIToolbox`RunCurrentEventLoopInMode+0x188
              HIToolbox`ReceiveNextEventCommon+0x9e
              HIToolbox`BlockUntilNextEventMatchingListInMode+0x51
              AppKit`_DPSNextEvent+0x34f
              AppKit`-[NSApplication nextEventMatchingMask:untilDate:inMode:dequ
              AppKit`-[NSApplication run]+0x335
              XUL`JSD_GetValueForObject+0xd7fea
              XUL`void std::__adjust_heap<__gnu_cxx::__normal_iterator<nsRefPtr
              XUL`XRE_main+0x3d6b
              firefox-bin`start+0x878
              firefox-bin`start+0x102
              firefox-bin`start+0x29
              firefox-bin`0x2
               71

I've annotated where XUL (Firefox core) calls into libmozjs, via JS_GC(). What I'm looking for is a transition from one body of code into another, here the transition into libmozjs, to see how other code is calling into the JavaScript engine.

When Things Break

While you can try to pick safer probes, there is always the possibility that something will break across software versions. How this happens to function names includes the following:

The function is renamed.
The function is deleted, and the code moved elsewhere.
A new function is added to perform this task, and this function is left behind in the source as legacy (or as a slow path).
The function is rewritten to perform a different action.

If it's (1) or (2), you are lucky, and something like this will happen (this is a pid provider-based script):

# ./mysql_pid_latency.d -p 23423
dtrace: failed to compile script ./mysql_pid_latency.d: line 18: probe description
pid23423::*parse_command*:entry does not match any probes

Then you can immediately start work fixing the script: which can include examining the soure code (if available) for the software version the script is written for and the new version, to see what became of that function.

If it's (3) or (4), your script will execute, but the results will not be what you expect. You could describe this as a silent failure. If you are lucky here, you'll have a good estimate of what the results should be, and what you get is so far off that you'll begin to investigate why. If you are unlucky, the output will seem plausible (but is wrong) and you'll continue your investigation elsewhere based on bad data. Maybe a week later you'll realize this, and may have to repeat a week's work.

On a different software version, the same pid probes may appear to work, but the results are wrong.

I deal with (3) and (4) in a few ways: I'll test with a known workload (if possible), especially one where I can create a prime number of events. I also sanity test with other tools, and quickly revisit source code to see if things have changed.

Preempting Breakage

I'm getting into the habit of documenting the software version in my pid provider DTrace scripts, with a block comment at the top, to help the future user debug a broken script. eg:

#!/usr/sbin/dtrace -s
/*
 * mysql_pid_latency.d  Print query latency distribution every second.
 *
 * USAGE: ./mysql_pid_latency.d -p mysqld_PID
 *
 * TESTED: these pid-provider probes may only work on some mysqld versions.
 *  5.0.51a: ok
 */
[...]

For each version I test this on, I'll add another line under TESTED to report "ok" or "fail".

Another suggestion is to keep it simple - try not to write DTrace scripts that are hundreds of lines long, and instrument dozens of functions.

pid provider scripts should be short and to the point.

I learnt this the hard way with tcpsnoop.d, a script based on the fbt provider, which is the kernel version of the pid provider. tcpsnoop.d is hundreds of lines long, instruments many kernel locations, and has broken several times since I first wrote it. These days I'm writing very short and to-the-point D scripts, which are safer (fewer probes means more stability) and easier to maintain. You'll find many of my examples in the DTrace book, where short scripts were also preferable as concise textbook examples.

Function Arguments

You can get a lot of mileage from just knowing that functions were called, via tracing the "entry" probe. But there is a lot more to the pid provider, including the "return" probe for function completion, instruction probes and arguments.

Arguments to the entry probe are the arguments to the function. My original strlen() example takes a string (char *) as an argument, which we can inspect using DTrace. I'll write a follow up post to discuss this.

What I'd like to note here is that function arguments, like the function names, are not technically a stable interface - and are subject to change when the software changes. Same comments apply: I don't think the arguments to strlen() will change anytime soon (if ever) however local functions could change at any point. If you are examining and printing arguments, this means that with software updates your D scripts could start printing out junk data.

Summary

While the pid provider provides great visibility into application internals, there are some limitations that you should be aware of when using it. Since it traces function names that can change between software versions, DTrace scripts or one-liners that use the pid provider may only work on the software version they were written for. On other software versions they may fail in an obvious way ("dtrace: failed to compile script"), or in a subtle way - leaving you with data that is plausible but wrong. To mitigate this, try to trace functions that are unlikely to change often (eg, public API functions), document the software version traced when sharing pid provider based scripts, and try to double check the output of DTrace scripts with other tools (sanity test).

And if you have found one of my pid provider based scripts doesn't work on your software version - my apologies in advance. This is an unavoidable risk when using the pid provider. I'm minimizing it by tracing the most stable looking functions, and documenting the software versions that the scripts were written for. Hopefully, more USDT providers will be written over time, so that reaching for the pid provider is only necessary when you don't have the stable USDT coverage.

Update: for more details, see the full list of pid provider posts.

Brendan Gregg's Blog