I originally posted this at http://dtrace.org/blogs/brendan/2011/02/11/dtrace-pid-provider-arguments.
In my previous post, I introduced the DTrace pid provider and discussed an important topic - the stability of its function "entry" probes. Here I'll continue discussing the pid provider, introducing function arguments and the stability of these as well. The aim of these posts is to share my experience using this provider, and of maintaing pid provider-based scripts. For more information on the pid provider, see the pid provider chapter of the DTrace Guide and the examples in the forthcoming DTrace book.
Function Entry Arguments
In my previous post I picked a simple and well-known function to begin tracing: libc's strlen(). This has a string pointer (char *) as the first argument. For the pid provider function entry probe, the first argument is available as the built-in variable "arg0", of type int64. Tracing that:
# dtrace -n 'pid$target:libc:strlen:entry { trace(arg0); }' -p 809 dtrace: description 'pid$target:libc:strlen:entry ' matched 1 probe CPU ID FUNCTION:NAME 1 63860 strlen:entry 134508268 1 63860 strlen:entry 135408008 1 63860 strlen:entry 135408008
DTrace can examine function arguments.
However, it's not so interesting just as a pointer address - we can't see the string it is pointing to. Showing the string:
# dtrace -n 'pid$target:libc:strlen:entry { trace(stringof(arg0)); }' -p 809 dtrace: description 'pid$target:libc:strlen:entry ' matched 1 probe dtrace: error on enabled probe ID 1 (ID 63860: pid809:libc.so.1:strlen:entry): invalid address (0x8046eec) in action #1 dtrace: error on enabled probe ID 1 (ID 63860: pid809:libc.so.1:strlen:entry): invalid address (0x8122988) in action #1 dtrace: error on enabled probe ID 1 (ID 63860: pid809:libc.so.1:strlen:entry): invalid address (0x8122988) in action #1
The DTrace built-in stringof() takes a pointer and treats it as a pointer to a NULL-terminated string. This sounds like the right tool here, since we have a pointer to data and we need to inform DTrace that this data is a string. But it didn't work.
This is a common mistake. The pointer is to an address in user-land, whereas DTrace runs in kernel-land - so it can't be dereferenced directly. This is like calling a phone number without the area code when you are in the wrong area code - you may get no one, or a random person, but you definitely won't reach the person you were looking for. This is fixable with the DTrace copyinstr() function ("copyin" is a term from kernel code to refer to copying in data from user-land to the kernel):
# dtrace -n 'pid$target:libc:strlen:entry { trace(copyinstr(arg0)); }' -p 809 dtrace: description 'pid$target:libc:strlen:entry ' matched 1 probe 0 63860 strlen:entry ls -F -l 0 63860 strlen:entry /usr/bin/ls 0 63860 strlen:entry /usr/bin/ls 0 63860 strlen:entry ls -F -l 0 63860 strlen:entry -l 0 63860 strlen:entry /export/home/brendan
Key points:
DTrace runs in kernel-land.
You can examine user-land (process) memory using copyinstr() (and copyin()).
If you forget this, you'll see "invalid address" (if you're lucky).
And if you are unlucky, the address will coincide with something valid in the kernel address space, and you could see plausible output and not realize that it is in fact garbage. (eg, 64-bit process, 64-bit kernel, same address range in use).
Simple Data Types
I'll demonstrate simple data type arguments by tracing strncmp(), whose function prototype is:
int strncmp(const char *s1, const char *s2, size_t n);
This demonstrates multiple arguments, strings and integers. I'll trace it for a bash shell:
# dtrace -qn 'pid$target::strncmp:entry { printf("%30.30s %30.30s %5d\n", copyinstr(arg0), copyinstr(arg1), arg2); }' -p 670 _=/bin/ls __CF_USER_TEXT_ENCODING=0x1F7: 2 _=/bin/ls _=/bin/ls 2 /Users/brendan /Users/brendan 14 bg ba 2 bind ba 2 break ba 2 builtin ba 2 ba banner 2 banner ba 2 ba basename 2 basename ba 2 [...]
While tracing I ran "ls -l", and then "ba" followed by a couple of tabs to list out command completion for "ba". You can see in the listing how bash was processing that - comparing available commands with the letters "ba" and a length of 2. This data was printed and formatted using printf().
Argument Summaries
Before we move on, I'd like to include an example of taking this simple argument data and presenting it as a summary, which I use about as often as listing all the data using trace() or printf().
Tracing malloc() (memory allocate) can be incredibly useful to see how applications are requesting memory. malloc() takes a size_t (integer) argument, which we could print:
# dtrace -n 'pid$target::malloc:entry { trace(arg0); }' -p 210 dtrace: description 'pid$target::malloc:entry ' matched 2 probes ^C CPU ID FUNCTION:NAME 1 97364 malloc:entry 4 1 97364 malloc:entry 4 1 97364 malloc:entry 16 1 97364 malloc:entry 16 1 97364 malloc:entry 252 1 97364 malloc:entry 60 [...]
However, malloc() often happens so frequently (thousands of times per second) that the overhead of printing out a line of data for each malloc() can slow the system and application down. Summaries are very useful here, such as DTrace's quantize() aggregation, used here to summarize Firefox malloc() during several seconds of browsing:
# dtrace -n 'pid$target::malloc:entry { @ = quantize(arg0); }' -p 210 dtrace: description 'pid$target::malloc:entry ' matched 2 probes ^C value ------------- Distribution ------------- count -1 | 0 0 | 1 1 | 16 2 | 540 4 |@@@@ 10493 8 |@@@@@@ 13944 16 |@@@@@@@ 17669 32 |@@@@@@@@@@@@@@ 33733 64 |@@ 4440 128 |@@ 3670 256 |@@ 5626 512 |@ 3037 1024 |@ 2546 2048 | 150 4096 | 76 8192 | 250 16384 | 17 32768 | 7 65536 | 181 131072 | 4 262144 | 0
Most of the requested sizes were in the 32 - 63 bucket, with one request in the 8 to 16 Mbyte range. We also caught one request for zero bytes!
When tracing malloc:entry, remember that this is requested, not returned bytes.
To see all allocations, trace other variants too: realloc(), calloc(), ...
The pid provider "return" can be used to examine the returned bytes (and errors) for these malloc() requests.
Argument Testing
In the previous example I just found something a little interesting, a zero byte malloc(). I can investigate this further as a demonstration of argument testing.
The argument values arg0..argN can be tested in DTrace predicates (/.../). Here I'll match the malloc() requests for zero bytes, and print the user-level stack back trace (this is still Firefox):
# dtrace -n 'pid$target::malloc:entry /arg0 == 0/ { ustack(); }' -p 210 dtrace: description 'pid$target::malloc:entry ' matched 2 probes CPU ID FUNCTION:NAME 1 97364 malloc:entry libSystem.B.dylib`malloc XUL`NS_Alloc_P+0x26 XUL`JNIEnv_::CallStaticObjectMethod(_jclass*, _jmethodID*, ...)+0x41827 XUL`DumpJSStack+0x12d397 XUL`DumpJSStack+0x12ce63 XUL`void std::__adjust_heap<__gnu_cxx::__normal_iterator<nsRefPtr XUL`NS_InvokeByIndex_P+0x58 XUL`DumpJSStack+0x2b58e [...]
Arguments can be tested in predicates
The above one-liner identified the code path calling a zero byte malloc(). This can be used by the developer to see why this happened.
While somewhat interesting, I doubt that these cause issues. I imagine they are a consequence of a code path allocating memory based on a dynamic variable - which is sometimes zero. The Chrome web browser does it too:
# dtrace -n 'pid$target::malloc:entry /arg0 == 0/ { @["zero byte malloc()s:"] = count(); }' -p 335 dtrace: description 'pid$target::malloc:entry ' matched 2 probes ^C zero byte malloc()s: 1139
For my quick tests, anyway.
For another example of testing, I'll trace BSD socket's getaddrinfo() function, which has the prototype:
int getaddrinfo(const char *hostname, const char *servname, const struct addrinfo *hints, struct addrinfo **res);
The first two arguments can be fetched using copyinstr(), for example:
# dtrace -n 'pid$target::getaddrinfo:entry { printf("%s %s", copyinstr(arg1), copyinstr(arg1)); }' -p 210 dtrace: description 'pid$target::getaddrinfo:entry ' matched 1 probe dtrace: error on enabled probe ID 1 (ID 98069: pid210:libSystem.B.dylib:getaddrinfo:entry): invalid address (0x0) in action #1 at DIF offset 24 dtrace: error on enabled probe ID 1 (ID 98069: pid210:libSystem.B.dylib:getaddrinfo:entry): invalid address (0x0) in action #1 at DIF offset 24
Oops. Didn't I do that right?
The problem this time is that I'm assuming that I can dereference the string. Not all pointers are valid, in this case I've tried to dereference a NULL pointer. This can be fixed by testing it first - here I've used ternary operators (A ? B : C):
# dtrace -n 'pid$target::getaddrinfo:entry { printf("%s %s", arg0 != NULL ? copyinstr(arg0) : "<NULL>", arg1 != NULL ? copyinstr(arg1) : "<NULL>"); }' -p 210 dtrace: description 'pid$target::getaddrinfo:entry ' matched 1 probe CPU ID FUNCTION:NAME 1 98069 getaddrinfo:entry mxr.mozilla.org <NULL> 1 98069 getaddrinfo:entry www.mozilla.org <NULL> 0 98069 getaddrinfo:entry www.joyent.com <NULL> 1 98069 getaddrinfo:entry 993-rjq-253.mktoresp.com <NULL> 0 98069 getaddrinfo:entry dtrace.org <NULL>
As with C programming, you may want to test pointers before dereferencing.
Advanced Data Types
Examining arguments has been straightforward so far, apart from the copyinstr(). And with the DTrace summarizing features, processing them has been a breeze.
Now things get a lot harder.
Let's look at the third argument to getaddrinfo(), which is a "struct addrinfo". On Mac OS X this is:
struct addrinfo { int ai_flags; /* input flags */ int ai_family; /* protocol family for socket */ int ai_socktype; /* socket type */ int ai_protocol; /* protocol for socket */ socklen_t ai_addrlen; /* length of socket-address */ struct sockaddr *ai_addr; /* socket-address for socket */ char *ai_canonname; /* canonical name for service location */ struct addrinfo *ai_next; /* pointer to next in list */ };
I'll demonstrate tracing the first four members of this struct:
# cat -n getaddrinfo.d 1 #!/usr/sbin/dtrace -s 2 3 pid$target::getaddrinfo:entry 4 { 5 this->hints = (struct addrinfo *)copyin(arg2, sizeof (struct addrinfo)); 6 printf("flags=%x family=%d socktype=%d protocol=%d", 7 this->hints->ai_flags, this->hints->ai_family, 8 this->hints->ai_socktype, this->hints->ai_protocol); 9 } # ./getaddrinfo.d -p 210 dtrace: failed to compile script ./getaddrinfo.d: line 6: operator -> cannot be applied to a forward declaration: no struct addrinfo definition is available
Since the argument to getaddrinfo() is a pointer to a struct that is in user-land (process) memory, it must first be copied into kernel memory for DTrace to inspect, which is done on line 5. After that, its direct members can be dereferenced as is done on lines 7 and 8.
But it didn't work. I wanted to demonstrate this error since you'll see many similar scripts which trace the kernel via the fbt provider, and work fine. In the kernel, struct definitions are usually available (via CTF) so that DTrace can navigate them. This isn't the case for user-land, like in this example where the addrinfo struct definition was unavailable, causing DTrace to error.
The -C option to DTrace will invoke the preprocessor (if it is available), allowing #include to be used so that the structure definitions can be sourced from header files. The man page for getaddrinfo() suggests to include the following, as I have on lines 3-5:
# cat -n getaddrinfo.d 1 #!/usr/sbin/dtrace -Cs 2 3 #include <sys/types.h> 4 #include <sys/socket.h> 5 #include <netdb.h> 6 7 pid$target::getaddrinfo:entry 8 { 9 this->hints = (struct addrinfo *)copyin(arg2, sizeof (struct addrinfo)); 10 printf("flags=%x family=%d socktype=%d protocol=%d", 11 this->hints->ai_flags, this->hints->ai_family, 12 this->hints->ai_socktype, this->hints->ai_protocol); 13 } # ./getaddrinfo.d -p 210 cc1: warning: /dev/fd/5 is shorter than expected dtrace: failed to compile script ./getaddrinfo.d: "/usr/include/libkern/_OSByteOrder.h", line 98: specified storage class not appropriate in D
This didn't work either, due to something from a header file that is a bit too tricky for DTrace to parse. This sort of issue isn't uncommon when trying to drag in header files and their dependencies. If you can #include something, great; if not, try including the struct definitions in your D script:
# cat -n getaddrinfo.d 1 #!/usr/sbin/dtrace -s 2 3 struct addrinfo { 4 int ai_flags; /* input flags */ 5 int ai_family; /* protocol family for socket */ 6 int ai_socktype; /* socket type */ 7 int ai_protocol; /* protocol for socket */ 8 socklen_t ai_addrlen; /* length of socket-address */ 9 struct sockaddr *ai_addr; /* socket-address for socket */ 10 char *ai_canonname; /* canonical name for service location */ 11 struct addrinfo *ai_next; /* pointer to next in list */ 12 }; 13 14 pid$target::getaddrinfo:entry 15 { 16 this->hints = (struct addrinfo *)copyin(arg2, sizeof (struct addrinfo)); 17 printf("flags=%x family=%d socktype=%d protocol=%d", 18 this->hints->ai_flags, this->hints->ai_family, 19 this->hints->ai_socktype, this->hints->ai_protocol); 20 } # ./getaddrinfo.d -p 210 dtrace: script './getaddrinfo.d' matched 1 probe CPU ID FUNCTION:NAME 0 98069 getaddrinfo:entry flags=0 family=0 socktype=1 protocol=0 1 98069 getaddrinfo:entry flags=0 family=0 socktype=1 protocol=0 0 98069 getaddrinfo:entry flags=0 family=0 socktype=1 protocol=0 ^C
Success. I can read socket.h to see what these numbers are for (socktype=1 is SOCK_STREAM, for example). I could also enhance my D script to print out text versions of these for readability.
You can declare structs in D scripts and then cast pointers as those structs, so that members can be dereferenced
What if I'd like to examine ai_addr and ai_next? They are pointers to other structs in user-land memory, so they would need to be copyin()d first. This will mean your script has multiple copyin()s. (And for getaddrinfo() we'd also need to do this on the "return" probe, after they have been populated.)
Argument Stability
The stability for arguments should follow the stability of the probe names, as discussed in my previous post. In summary: since this is the pid provider and is exposing application functions, nothing is guaranteed to be stable. This means that while your script may work fine on one version of an application, it may not work at all on the next since the arguments to functions may have changed. And, they could change in subtle ways (silent failure).
Retesting pid provider scripts on new software versions is necessary to ensure you are looking at what you think you are. Also, keep an eye out for the addition of USDT providers to the application - which should be designed to present stable arguments.
Revalidate pid provider scripts on new software versions
Picking library functions or public API functions should be the safest, if you must use pid provider. Even so, you could run into trouble. The above script that traced getaddrinfo() sounds pretty stable. However, take a look at how the addrinfo struct is defined on Solaris:
struct addrinfo { int ai_flags; /* AI_PASSIVE, AI_CANONNAME, AI_NUMERICHOST, AI_NUMERICSERV AI_V4MAPPED, AI_ALL, AI_ADDRCONFIG */ int ai_family; /* PF_xxx */ int ai_socktype; /* SOCK_xxx */ int ai_protocol; /* 0 or IPPROTO_xxx for IPv4 & IPv6 */ socklen_t ai_addrlen; /* length of ai_addr */ char *ai_canonname; /* canonical name for nodename */ struct sockaddr *ai_addr; /* binary address */ struct addrinfo *ai_next; /* next structure in linked list */ };
The last three arguments are in a different order. Since the D script declares this struct, then you are informing DTrace on Solaris of the wrong structure, and dereferencing ai_addr or ai_canonname will result in the wrong data (they are switched).
Shedding New Light
As is the case with other tools, new observability leads to new discoveries. With the pid provider I'm tripping over discoveries all the time. Earlier when I was looking for simple functions to start with (and returned to using strlen()), I also tried printf(), as called by the bash history built-in. i.e., this:
$ history 591 hash 592 type ls 593 ls
DTrace shows the format string printf() is using:
# dtrace -n 'pid$target:libc:printf:entry { trace(copyinstr(arg0)); }' -p 809 dtrace: description 'pid$target:libc:printf:entry ' matched 1 probe 1 65354 printf:entry %5d%c %s%s 1 65354 printf:entry %5d%c %s%s 1 65354 printf:entry %5d%c %s%s
Oh. What's the "%c" for? I didn't realize there was another column in there.
While this is an interesting aside, I've had many similar accidental discoveries that were very relevant for solving application issues.
Summary
The pid provider can examine arguments to user-land functions, which can be used to understand exactly what an application is processing. Since DTrace runs in kernel context, pointers to user-land such as strings and structs must be copyinstr()d and copyin()d first. If needed, structs can be defined in D scripts, or the C preprocessor may allow header files to be #included. Another useful DTrace feature is the aggregation functions, which can be used to summarize this data in practical ways.
As with the probe names, these pid provider arguments are not considered a stable interface. Scripts written for one software version may not work on another. While the observability is great, if you are going to use pid provider scripts frequently, you'll need to revalidate and update them for new versions of the target software.