This is an example of performing troubleshooting using DTrace
from Solaris 10.
Note: This document is not written by Sun, and has been based on beta Solaris 10.
A great tool to try early on is prstat with microstate accounting,
# prstat -m PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP 1969 root 2.8 1.7 0.0 0.0 0.0 52 43 0.0 177 38 .2M 0 java/21 14422 root 0.1 0.4 0.0 0.0 0.0 0.0 99 0.0 39 0 487 0 prstat/1 551 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 19 0 35 0 dtgreet/1 550 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 17 0 30 0 dtgreet/1 461 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 6 0 36 0 Xvnc/1 552 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 16 0 30 0 dtgreet/1 463 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 7 0 42 0 Xvnc/1 455 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 6 0 36 0 Xvnc/1 [...] |
From the above, 0.2 million system calls looks interesting.
Lets take a closer look at that java process by using DTrace to count the number of each system call used. We trace syscalls from processes with the name "java", as those processes appear to be causing all the system calls (there is more than one "java" process). I run DTrace during an entire SMC first boot,
# dtrace -n 'syscall:::entry /execname == "java"/ { @Num[probefunc] = count(); }' ^C putmsg 1 issetugid 1 getgid 1 rexit 2 chmod 2 kill 2 shutdown 2 getpeername 3 pathconf 3 [...] open 8640 llseek 8671 fstat64 9906 lwp_cond_wait 16048 resolvepath 21061 lwp_mutex_timedlock 21912 lwp_mutex_wakeup 22084 close 27603 ioctl 40783 stat64 62643 pollsys 141273 write 1102115 read 12934468 |
The above shows that most of the system calls are read()s. Since there is not an equally large number of lseek()s, we can assume that these reads are mostly sequential and not random reads.
Now we take a closer look at those reads, in particular the attempted size of each read. This time DTrace is run for several minutes during boot,
# dtrace -n 'syscall::read:entry /execname == "java"/ { @Size = quantize(arg2); }' ^C value ------------- Distribution ------------- count 0 | 0 1 |@@@@@@@@@@@@@@@@@@@@@@ 4302108 2 |@@@@@ 1032476 4 |@@@@@@@@ 1516412 8 |@@@ 625137 16 | 40540 32 |@ 164393 64 |@ 104476 128 | 2 256 | 0 [...] |
This shows that many of those reads are requesting 1 byte!
Often it's also interesting to check the read returns, to verify the successful bytes read. The following DTrace command does this, and is also run for several minutes,
# dtrace -n 'syscall::read:return /execname == "java"/ { @Size = quantize(arg0); }' ^C value ------------- Distribution ------------- count -2 | 0 -1 | 1063 0 | 566 1 |@@@@@@@@@@@@@@@@@@@@@@@ 3156897 2 |@@@@@ 693066 4 |@@@@@@@ 1030087 8 |@@@ 458795 16 | 20322 32 |@ 98572 64 |@ 77944 128 | 178 256 | 284 [...] |
and in this case produces similar results to the requested reads.
There are many ways DTrace can be used to check what these reads are for, we begin by grabbing the file descriptor and doing an aggregate on the pathname. The following standalone dtrace script is run during the entire boot,
# cat filereads.d /* Trace open filenames */ syscall::open:entry /execname == "java"/ { self->file = copyinstr(arg0); } /* Create lookup table of filedes -> filename */ syscall::open:return /execname == "java" && self->file != ""/ { path[pid,arg0] = self->file; self->file=0; } /* Count file reads by filename */ syscall::read:entry /execname == "java" && path[pid,arg0] != ""/ { @num[path[pid,arg0]] = count(); } # # dtrace -s filereads.d dtrace: script 'filereads.d' matched 3 probes ^C /var/sadm/smc/toolboxes/smc/smc.gif 1 /var/sadm/smc/toolboxes/this_computer/services_16.gif 1 /var/sadm/smc/toolboxes/this_computer/services_32.gif 1 /var/sadm/smc/toolboxes/this_computer/devices_16.gif 1 /var/sadm/smc/toolboxes/this_computer/devices_32.gif 1 [...] /usr/sadm/lib/volmgr/VVolMgr.jar 102 /usr/sadm/lib/wbem.jar 103 /var/sadm/wbem/logr/Snapshot.1 136 /usr/openwin/lib/X11/fonts/TrueType/Arial.ttf 187 /usr/openwin/lib/X11/fonts/TrueType/Arial-Bold.ttf 188 /usr/openwin/lib/X11/fonts/TrueType/TimesNewRoman.ttf 193 /usr/sadm/lib/smc/lib/dtds/viperbean_1_0.dtd 205 /usr/openwin/lib/X11/fonts/TrueType/Symbol.ttf 326 /usr/j2se/jre/lib/fonts/LucidaSansRegular.ttf 431 /etc/syslog.conf 1001 /usr/sadm/lib/smc/lib/dtds/toolbox.dtd 1116 /usr/lib//liblayout.so 1443 /var/sadm/wbem/logr/store 1611 /usr/sadm/lib/wbem/store 2231 /usr/j2se/jre/lib/sparc/libfontmanager.so 4758 /var/sadm/smc/properties/classlist.txt 61201 /var/sadm/smc/properties/registry.ser 12716152 |
A lot of files were read (a dozen screen fulls), which in itself may be of interest here; however 12.7 million reads of the registry.ser file certainly stands out.
Lets check that file size,
# ls -l /var/sadm/smc/properties/registry.ser -rw-r--r-- 1 root root 72676 Jan 17 17:01 /var/sadm/smc/properties/registry.ser |
So this is a 72 kilobyte file that is being read() at least 12 million times, a majority of those reads are sequential and about 1 byte in size.
I should note that it is likely that there are some very good reasons why SMC needs to call all these read()s, DTrace has merely suggested why the process takes some time - it may in fact be wrong to jump to conclusions and assume that this is a problem.
DTrace can check more carefully for this specific case - the requested read() sizes for the /var/sadm/smc/properties/registry.ser file only. An entire SMC first boot is traced,
# cat regreads.d /* Trace open filenames */ syscall::open:entry /execname == "java"/ { self->file = copyinstr(arg0); } /* Create lookup table of filedes -> filename */ syscall::open:return /execname == "java" && self->file != ""/ { path[pid,arg0] = self->file; self->file=0; } /* Measure file reads by this particular file */ syscall::read:entry /execname == "java" && path[pid,arg0] == "/var/sadm/smc/properties/registry.ser"/ { @Size = quantize(arg2); } # # dtrace -s regreads.d dtrace: script 'regreads.d' matched 3 probes ^C value ------------- Distribution ------------- count 0 | 0 1 |@@@@@@@@@@@@@@@@@@@@@@ 7070709 2 |@@@@@ 1646078 4 |@@@@@@@@ 2428123 8 |@@@ 1031849 16 | 58258 32 |@ 248439 64 |@ 173775 128 | 0 256 | 0 512 | 0 1024 | 2 2048 | 2 4096 | 0 |
This confirms that this one file is being read() using mostly 1 byte reads.
DTrace has access to nanosecond timestamps that can be used to measure the time during system calls, user functions, etc.
So far we have found read()s on the registry.ser file interesting, so the following DTrace script has been specifically written to gather more information on these events. It sums the time for each type of system call and inserts three extra catagories - "read:registry.ser" are the reads of the registry.ser file only (the other reads exclude that time), "TOTAL:" is for the total time for all system calls, and "DURATION:" is the time of this DTrace sample (the boot time). This is run for an entire boot,
# cat time.d dtrace:::BEGIN { self->begin = timestamp; } syscall::open:entry /execname == "java"/ { self->file = copyinstr(arg0); } syscall::open:return /execname == "java" && self->file != ""/ { path[pid,arg0] = self->file; self->file = 0; } syscall::read:entry /execname == "java" && path[pid,arg0] == "/var/sadm/smc/properties/registry.ser"/ { self->registry = vtimestamp; } syscall:::entry /execname == "java" && self->registry == 0/ { self->start = vtimestamp; } syscall::read:return /self->registry/ { this->time = vtimestamp - self->registry; @Time["read:registry.ser"] = sum(this->time); @Time["TOTAL:"] = sum(this->time); self->registry = 0; } syscall:::return /self->start/ { this->time = vtimestamp - self->start; @Time[probefunc] = sum(this->time); @Time["TOTAL:"] = sum(this->time); self->start = 0; } dtrace:::END { @Time["DURATION:"] = sum(timestamp - self->begin); } # # # dtrace -b 128m -s time.d ^C getgid 800 issetugid 800 gtime 3800 sigpending 6400 c2audit 8800 getuid 12200 fstat 14400 [...] getdents64 36732600 munmap 49234400 lwp_mutex_wakeup 52178200 fdsync 55799000 fstat64 62206600 unlink 68512000 ioctl 123305400 stat64 143406600 lwp_mutex_timedlock 204307400 rename 257776400 lwp_cond_wait 279028200 open 441507800 close 450650000 fork1 490807800 resolvepath 497607000 exece 508040800 pollsys 2163482200 read 4057674600 write 15972839600 read:registry.ser 93894490000 TOTAL: 120157772400 DURATION: 1827896864000 |
The above report shows that during this 30 minute sample, system calls accounted for a total of 120 seconds on the CPU, and read()s on the registry.ser file accounted for a total of 94 seconds on the CPU - which is 78% of the system call time. There values were in DURATION:, TOTAL: and read:registry.ser respectively.
Now to get a better understanding of the nature of these read()s, we can try printing out the stack backtrace, here we use jstack() - which prints a user stack backtrace with some Java translations. These stack backtraces are also aggregated, so we can see what the most common stack backtrace is,
# cat jstack.d syscall::open:entry /execname == "java"/ { self->file = copyinstr(arg0); } syscall::open:return /execname == "java" && self->file != ""/ { path[pid,arg0] = self->file; self->file=0; } syscall::read:entry /execname == "java" && path[pid,arg0] == "/var/sadm/smc/properties/registry.ser"/ { @Stack[jstack()] = count(); } # # dtrace -s jstack.d ^C [...] 0xf9c05c64 0xf9c00118 libjvm.so`__1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_p\ nGThread__v_+0x274 libjvm.so`__1cRjni_invoke_static6FpnHJNIEnv__pnJJavaValue_pnI_jobject_n\ LJNICallType_pnK_jmethodID_pnSJNI_ArgumentPusher_pnGThread__v_+0x218 libjvm.so`jni_CallStaticVoidMethod+0x13c java`main+0x158c java`_start+0x108 347325 libc.so.1`_read+0x8 libjvm.so`JVM_Read+0x68 libjava.so`0xfea71e1c libjava.so`Java_java_io_FileInputStream_read+0x28 0xf9dca918 0xf9cbc2fc 0xf9cc05a0 0xf9cbd1bc 0xf9cc6cf0 0xf9ccbb40 0xf9cc0998 0xf9ccf1bc 0xf9cc0940 0xf9cbd1bc 0xf9cc6cf0 0xf9ccbb40 0xf9cc0998 0xf9ccf1bc 0xf9cc0940 0xf9ccf1bc 0xf9cc0940 0xf9cd5a74 0xf9c05b10 0xf9c05b10 0xf9e5a730 0xf9e86f70 0xf9c05c64 0xf9c05c64 0xf9c05c64 0xf9c05c64 0xf9c05c64 0xf9c05c64 0xf9c00118 libjvm.so`__1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_p\ nGThread__v_+0x274 libjvm.so`__1cRjni_invoke_static6FpnHJNIEnv__pnJJavaValue_pnI_jobject_n\ LJNICallType_pnK_jmethodID_pnSJNI_ArgumentPusher_pnGThread__v_+0x218 libjvm.so`jni_CallStaticVoidMethod+0x13c java`main+0x158c java`_start+0x108 367591 |
The last stack backtrace was the most common, occuring 367591 times. This information would be of particular use to the developers to identify which parts of their code were causing these read()s.
DTrace also allows us to use tools that let us examine new areas of system behaviour. For example, execsnoop is used to capture an entire boot,
# ./execsnoop -v | tee execsnoop.out STRTIME UID PID PPID CMD 2005 Jan 17 17:32:57 0 1934 1932 /bin/ksh /usr/sbin/smc 2005 Jan 17 17:32:57 0 1936 1935 /usr/bin/sh /usr/bin/dirname /usr/sbin/smc [...] 2005 Jan 17 17:34:02 0 3751 3730 fgrep -c [ /var/sadm/smc/smcreg 2005 Jan 17 17:34:02 0 3752 1950 /usr/bin/chmod +x /var/sadm/smc/scripts/SUNWdclnt.reg/SUNWdclnt_ 2005 Jan 17 17:34:02 0 3753 1950 /bin/sh /usr/sadm/lib/smc/prereg/SUNWcsmc/SUNWcsmc_reg.sh 2005 Jan 17 17:34:02 0 3754 3753 /bin/ksh /usr/sadm/bin/smcregister library -n SysLogServiceBean_ 2005 Jan 17 17:34:02 0 3756 3755 /usr/bin/sh /usr/bin/dirname /usr/sadm/bin/smcregister 2005 Jan 17 17:34:02 0 3756 3755 /usr/bin/expr /usr/sadm/bin/smcregister/ : \(/\)/*[^/]*//*$ | /u 2005 Jan 17 17:34:02 0 3757 3754 /usr/bin/sh /usr/bin/basename /usr/sadm/bin/smcregister 2005 Jan 17 17:34:02 0 3757 3754 /usr/bin/expr //usr/sadm/bin/smcregister : \(.*[^/]\)/*$ : .*/\( 2005 Jan 17 17:34:02 0 3754 3753 /bin/ksh /usr/sadm/lib/smc/bin/smcregister library -n SysLogServ 2005 Jan 17 17:34:02 0 3758 3754 /usr/bin/sh /bin/basename /usr/sadm/lib/smc/bin/smcregister 2005 Jan 17 17:34:02 0 3758 3754 /usr/bin/expr //usr/sadm/lib/smc/bin/smcregister : \(.*[^/]\)/*$ 2005 Jan 17 17:34:02 0 3759 3754 date -u +%Y%m%d%H%M%S 2005 Jan 17 17:34:02 0 3760 3754 /usr/bin/sh /bin/basename SysLogServiceBean_zh.jar 2005 Jan 17 17:34:02 0 3760 3754 /usr/bin/expr /SysLogServiceBean_zh.jar : \(.*[^/]\)/*$ : .*/\(. 2005 Jan 17 17:34:02 0 3761 3754 sed -e s@^//@/@ 2005 Jan 17 17:34:02 0 3763 3754 sed -e s@^//@/@ 2005 Jan 17 17:34:02 0 3765 3754 sed -e s@^//@/@ 2005 Jan 17 17:34:02 0 3767 3754 sed -e s@^//@/@ 2005 Jan 17 17:34:02 0 3769 3754 sed -e s@^//@/@ 2005 Jan 17 17:34:02 0 3771 3754 sed -e s@^//@/@ [...] ^C # # wc -l execsnoop.out 7742 execsnoop.out # |
which shows 7742 processes were executed. Lets try an experiment to get a grip on what effect creating and destroying 7742 processes has on a system. I run a tiny command 7742 times and time it. This is on an UltraSPARC 5 with a 360 MHz CPU,
# perl -e 'print "date\n" x 7742;' | ptime sh > /dev/null real 2:10.397 user 31.131 sys 1:29.769 |
That took over 2 minutes. This suggests that just running 7742 commands is likely to be a considerable factor towards the overall boot time.
Now we will use DTrace again to print a summary of the commands that were executed. This is run during an entire SMC boot,
# dtrace -n 'syscall::exec*:return { @Exec[execname] = count(); }' ^C stty 1 SUNWhksmc_reg.sh 1 SUNWdlvmg_reg.sh 1 SUNWhdcl_reg.sh 1 SUNWhmga_reg.sh 1 SUNWhsmc_reg.sh 1 SUNWddcl_reg.sh 1 SUNWdmgp_reg.sh 1 [...] mkdir 30 grep 45 cat 45 chmod 57 cp 79 rm 88 id 186 fgrep 614 smcregister 614 date 615 ksh 616 dirname 801 basename 2029 expr 3127 sed 3200 |
The above report highlights the main type of commands that were executed - shell scripting commands. Ok, if we are already using ksh (some 616 times), then it's suprising that we are using external versions of dirname, basename, expr and sed - the Korn shell has much of this functionality built in. In fact, Perl has all of this functionality built in.
Another DTrace tool to try is iosnoop, which was used to check disk behaviour during the boot,
# ./iosnoop -v > iosnoop.out # # head iosnoop.out STRTIME UID PID D BLOCK SIZE COMM PATHNAME 2005 Jan 17 17:35:00 0 8036 R 15721760 8192 dtrace /lib/sparcv9/libc.so.1 2005 Jan 17 17:35:00 0 8041 R 10267472 8192 sed /lib/libc.so.1 2005 Jan 17 17:35:00 0 8036 R 20896480 8192 dtrace /usr/lib/sparcv9/libdtrace.so.1 2005 Jan 17 17:35:00 0 8036 R 15720624 8192 dtrace /lib/sparcv9/libc.so.1 2005 Jan 17 17:35:00 0 8036 R 15721744 8192 dtrace /lib/sparcv9/libc.so.1 2005 Jan 17 17:35:00 0 1267 R 20628288 8192 snmpd /usr/sfw/lib/sparcv9/libnetsnmp.so.5.0.9 2005 Jan 17 17:35:00 0 3 W 5069290 2048 fsflush /var/sadm/smc/scripts/SUNWcsmc.reg/SUNWs... 2005 Jan 17 17:35:01 0 1943 R 251568 8192 java /usr/j2se/jre/lib/sparc/libawt.so 2005 Jan 17 17:35:01 0 1943 R 251584 8192 java /usr/j2se/jre/lib/sparc/libawt.so # # wc -l iosnoop.out 9504 iosnoop.out # # grep registry.ser iosnoop.out | wc -l 2228 # # grep registry.ser iosnoop.out | grep ' W ' | wc -l 2228 # |
During this sample, 9504 I/O events occured (these are captured at the block device, many will not make it past the file system caches). Of these events, 2228 of them were for that registry.ser file, and all of these were writes (I can assume the reads are absorbed by the page cache).
The following are observations and conclusions from the above DTrace measurements,
These may help explain what work is performed during an SMC boot.
I hope this has been a useful demonstration of using DTrace from the command line, as well as from DTrace scripts.
More DTrace
Back to Brendan Gregg's Homepage