DTrace Troubleshooting

This is an example of performance analysis using DTrace from Solaris 10.
Note: This document is not written by Sun.

prstat vs top

prstat (Solaris 8) and top (freeware) are two tools used to monitor the performance of the system, in particular process activity. There has been some minor debate over the years about the performance impact of each, in particular whether top is a burden on the system. Here we use DTrace to measure what is really happening.

1. Background

I've heard some sysadmins describe top as a CPU hog, others say they have never had a problem with the tool. Some have made the move to prstat, which is more accurate and supposedly less of a burden; others swear by top, especially due to the extra info printed. The following are stories I've heard over the years related to these tools and performance,

A sysadmin once told me he was called to a performance problem on a production server, where all the managers were logged in checking it out. They couldn't see what was wrong, but he could - they were all running top. Once he convinced them not to the problem disappeared.
One sysadmin said they had renamed top at work to "top_is_a_cpu_hog" to draw attention to the problem. Admins who continued to run top under the new name were constantly reminded of the problem in the process listing.

So is top really a hog, or is this another "admin myth" like "sync; sync; sync; reboot"? DTrace can help us find out.

2. top and prstat

top was written by William LeFebvre, first in 1984 (inspired by VMS's top 10 tool). top has been one of the most useful tools ever created for Unix, making the most of a 24x80 text display to show a system summary. A sample output is below,

$ top
last pid:  4018;  load averages:  1.07,  1.04,  1.02                   09:16:41 
95 processes:  90 sleeping, 2 running, 2 stopped, 1 on cpu
CPU states:  0.0% idle, 98.6% user,  1.4% kernel,  0.0% iowait,  0.0% swap
Memory: 512M real, 170M free, 213M swap in use, 1169M swap free
 
   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
  3929 brendan    1  27    1   18M   17M run    414:06 96.19% setiathome
  4011 root       1  59    0 3052K 1188K cpu      0:00  1.52% top
  3597 brendan    1  59    0   71M   66M sleep    5:23  0.45% Xorg
  4019 brendan    5  49    0  108M   60M sleep    0:48  0.08% soffice.bin
  3730 brendan    3  49    0   77M   53M sleep    4:27  0.03% mozilla-bin
  5185 brendan    1  59    0 3816K 2860K sleep    0:03  0.02% xterm
  5637 brendan    1  49    0 3632K 2660K sleep    0:04  0.02% xterm
  5199 root       1  49    0 2812K 2032K sleep    0:00  0.02% bash
  3909 brendan    1  49    0 6228K 3604K sleep    0:22  0.01% dtterm
  3642 brendan    1  59    0 3692K 2280K sleep    0:18  0.01% afterstep
   135 root       1  59    0 1852K 1020K sleep    0:15  0.01% in.routed
     7 root      12  59    0   10M 8112K run      0:08  0.01% svc.startd
    84 root      24  59    0 3084K 2112K sleep    0:03  0.00% nscd
   354 root       1  59    0 5428K 1828K sleep    0:02  0.00% sendmail
  5642 brendan    1  59    0 1928K  668K sleep    0:00  0.00% rlogin

The bulk of the output is a process listing sorted on CPU usage, with other useful server statistics. There is the "Memory:" summary, which can help identify RAM or Virtual Memory shortages; and "last pid" which may help to suggest problems with short lived processes.

prstat was added to Solaris in Solaris 8. The following is the default output,

$ prstat
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP         
  3929 brendan    18M   17M run      7    1   6:55:29  97% setiathome/1
  3597 brendan    71M   66M sleep   59    0   0:05:24 1.3% Xorg/1
  3642 brendan  3692K 2280K sleep   59    0   0:00:18 0.4% afterstep/1
  4014 root     4352K 4144K cpu0    59    0   0:00:00 0.2% prstat/1
  5185 brendan  3816K 2868K sleep   59    0   0:00:03 0.2% xterm/1
  3909 brendan  6236K 3612K sleep   49    0   0:00:22 0.1% dtterm/1
  4019 brendan   108M   60M sleep   49    0   0:00:48 0.1% soffice.bin/5
  3648 brendan  3580K 2008K sleep   59    0   0:00:01 0.1% Pager/1
  5199 root     2812K 2032K sleep   49    0   0:00:00 0.0% bash/1
  3644 brendan  2684K 1432K sleep   59    0   0:00:00 0.0% Animate/1
  3913 brendan  2120K 1312K sleep   49    0   0:00:00 0.0% rlogin/1
   135 root     1852K 1020K sleep   59    0   0:00:15 0.0% in.routed/1
  3914 brendan  1928K  664K sleep   49    0   0:00:00 0.0% rlogin/1
     7 root       10M 8112K sleep   59    0   0:00:08 0.0% svc.startd/12
  3730 brendan    77M   53M sleep   49    0   0:04:27 0.0% mozilla-bin/3
  3649 brendan  3616K 2136K sleep   49    0   0:00:00 0.0% WinList/1
    84 root     3084K 2112K sleep   59    0   0:00:03 0.0% nscd/24
  5644 brendan  3624K 2668K sleep   49    0   0:00:00 0.0% xterm/1
  5190 brendan  3624K 2656K sleep   49    0   0:00:03 0.0% xterm/1
   354 root     5428K 1828K sleep   59    0   0:00:02 0.0% sendmail/1
   234 root     1788K 1032K sleep   59    0   0:00:00 0.0% ttymon/1
Total: 95 processes, 202 lwps, load averages: 1.06, 1.03, 1.01

The prstat output is almost entirerly a process listing, missing some of the server details that top provides. There is a "NLWP" for number of Light Weight Processes, which can be useful to identify multithreaded processes. (and in Solaris 9+ this value equals the number of user threads).

prstat also has other output styles. The following is microstate accounting,

$ prstat -m
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP    
  3929 brendan   98 0.0 0.1 0.0 0.0 0.0 0.0 1.4   0 416  68   0 setiathome/1
  3597 brendan  0.2 0.2 0.0 0.0 0.0 0.0 100 0.1 157   6 688  26 Xorg/1
  4032 root     0.0 0.2 0.0 0.0 0.0 0.0 100 0.0  23   0 240   0 prstat/1
   135 root     0.0 0.1 0.0 0.0 0.0 0.0 100 0.0   2   0  37   0 in.routed/1
  5185 brendan  0.0 0.0 0.0 0.0 0.0 0.0 100 0.1   2   3  32   0 xterm/1
  3642 brendan  0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   4   0  84   0 afterstep/1
  4019 brendan  0.0 0.0 0.0 0.0 0.0  40  60 0.2  42  10  43   0 soffice.bin/5
  3648 brendan  0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   3   0  15   0 Pager/1
   304 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   3   0   3   0 miniserv.pl/1
   354 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   3   0  10   0 sendmail/1
  5190 brendan  0.0 0.0 0.0 0.0 0.0 0.0 100 0.1   1   1   4   0 xterm/1
   244 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   3   0   2   0 utmpd/1
   234 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 ttymon/1
   218 daemon   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 lockd/2
   110 daemon   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 kcfd/3
   224 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 sac/1
   214 daemon   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 statd/1
    84 root     0.0 0.0 0.0 0.0 0.0 4.2  96 0.0  30   0  69   0 nscd/24
   329 root     0.0 0.0 0.0 0.0 0.0  64  36 0.0   3   0   1   0 fmd/11
    98 root     0.0 0.0 0.0 0.0 0.0  25  75 0.0   0   0   0   0 picld/4
  1965 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 ttymon/1
Total: 95 processes, 202 lwps, load averages: 1.06, 1.04, 1.04

Other output styles include by user (-a), by project (-J), by task (-T) and by zone (-Z, Solaris 10).

3. System Calls

There are many ways DTrace can analyse these tools. We begin by checking for system calls. Each tool is run with an interval of 1 second, and updates the screen 10 times.

prstat system calls, "prstat 1 10" was run during the following dtrace command to frequency count system calls,

# dtrace -n 'syscall:::entry /execname == "prstat"/ { @num[probefunc] = count(); }' 
dtrace: description 'syscall:::entry ' matched 226 probes
^C
[...]
  pollsys                                                           9
  sigaction                                                        10
  getloadavg                                                       10
  ioctl                                                            11
  mmap                                                             19
  llseek                                                           19
  getdents64                                                       20
  write                                                            32
  brk                                                              34
  close                                                           102
  open                                                            105
  doorfs                                                          700
  pread                                                           960

Many are pread(), doorfs(), and about a hundred open()s and close()s each.

top system calls, "top -s1 -d10" was run during the following dtrace command,

# dtrace -n 'syscall:::entry /execname == "top"/ { @num[probefunc] = count(); }'  
dtrace: description 'syscall:::entry ' matched 228 probes
^C
[...]
  munmap                                                            9
  uadmin                                                           10
  gtime                                                            10
  write                                                            11
  getuid                                                           11
  sysconfig                                                        13
  llseek                                                           19
  getdents64                                                       20
  pread64                                                          21
  brk                                                              24
  mmap                                                             31
  ioctl                                                            43
  read                                                            994
  close                                                          1000
  open                                                           1002

Now we have about a thousand reads()s, open()s and close()s each. So clearly more system calls.

top file opens, Lets try to determine what top is opening so many times,

# dtrace -n 'syscall::open:entry /execname == "top"/ { @name[copyinstr(arg0)] = count(); }' 
dtrace: description 'syscall::open:entry ' matched 1 probe
^C

  /var/ld/ld.config                                                 1
  /usr/share/lib/zoneinfo/Australia/NSW                             1
  /dev/kstat                                                        1
  /usr/share/lib/terminfo//x/xterm                                  1
[...]
  3911/psinfo                                                      10
  3914/psinfo                                                      10
  3920/psinfo                                                      10
  3928/psinfo                                                      10
  5645/psinfo                                                      10
  5644/psinfo                                                      10
  5642/psinfo                                                      10
  1/psinfo                                                         10

The output shows that there isn't a single file opened hundreds of times, instead the psinfo files from /proc are opened ten times each, once for each update.

prstat file opens, Trying the same for prstat gives,

dtrace -n 'syscall::open:entry /execname == "prstat"/ { @name[copyinstr(arg0)] = count(); }'
dtrace: description 'syscall::open:entry ' matched 1 probe
dtrace: error on enabled probe ID 1 (ID 15: syscall::open:entry): invalid address (0xd27b20af) in action #2 at DIF offset 28
^C

  /proc/4092/psinfo                                                 1
  /proc/3919/psinfo                                                 1
  /proc/3929/psinfo                                                 1
[...]
  /proc/211/psinfo                                                  1
  /proc/1/psinfo                                                    1
  /dev/tty                                                          1
  /lib/libcurses.so.1                                               1
  /var/ld/ld.config                                                 2
  /lib/libc.so.1                                                    2
  /usr/share/lib/terminfo//x/xterm                                  2

The above list showed that each psinfo file had only been opened once.

top dtruss, A little truss may highlight what is happening. A DTrace version of truss called "dtruss" is used, which is a lightweight safer version of truss (although in this case it makes little difference if the original "truss" command was used). Here we focus on the system calls for the subsequent screen updates that gather the psinfo data,

# dtruss -n top
[...]
 4299:  open("3920/psinfo\0", 0, 805853a)                = 9 0  
 4299:  read(9, "\0", 150)               = 336 0
 4299:  close(9)                 = 0 0
 4299:  open("3914/psinfo\0", 0, 805853a)                = 9 0
 4299:  read(9, "\0", 150)               = 336 0
 4299:  close(9)                 = 0 0
 4299:  open("4296/psinfo\0", 0, 805853a)                = 9 0
 4299:  read(9, "\0", 150)               = 336 0
 4299:  close(9)                 = 0 0
 4299:  open("3911/psinfo\0", 0, 805853a)                = 9 0
 4299:  read(9, "\0", 150)               = 336 0
 4299:  close(9)                 = 0 0
 4299:  open("4200/psinfo\0", 0, 805853a)                = 9 0
 4299:  read(9, "\0", 150)               = 336 0
 4299:  close(9)                 = 0 0
[...]

So top does an open, read and close for every process, on every update.

prstat dtruss, Checking the updates for prstat,

# dtruss -n prstat
[...]
 4295:  pread(11, "\0", 150)             = 336 0   
 4295:  pread(12, "\0", 150)             = 336 0
 4295:  pread(13, "\0", 150)             = 336 0
 4295:  pread(14, "\0", 150)             = 336 0
 4295:  pread(15, "\0", 150)             = 336 0
[...]

prstat has left the file handles open, so for each update it only needs to pread the new data for every process.

So far, DTrace has been used to show that top uses extra system calls over prstat. Maybe that is a performance problem. While DTrace is a very powerful tool, many times it is used for similar output to what the original truss could provide. DTrace has made the data easier and safer to generate - it's a case of DTrace making the difficult things easy.

4. Syscall Timing

Maybe extra system calls performed by top are a problem, maybe they are not. what we need are measurements of the CPU overhead for these extra calls.

prstat syscall overhead, The following DTrace script is used to measure the CPU overhead of system calls,

# cat time.d
#!/usr/sbin/dtrace -s
syscall:::entry
/execname == $$1/
{
        self->start = vtimestamp;
}
syscall:::return
/self->start/
{
        this->time = vtimestamp - self->start;
        @Time[probefunc] = sum(this->time);
        @Time["TOTAL:"] = sum(this->time);
        self->start = 0;
}
#
# ./time.d prstat
dtrace: script './time.d' matched 458 probes
^C
[...]
  read                                                          50544
  llseek                                                       100493
  munmap                                                       102172
  xstat                                                        117387
  resolvepath                                                  146191
  brk                                                          236784
  mmap                                                         264860
  ioctl                                                        328747
  memcntl                                                      427722
  pollsys                                                      637806
  close                                                        910030
  write                                                       1647629
  exece                                                       2052044
  open                                                        2339907
  doorfs                                                      6649380
  getdents64                                                  7143790
  pread                                                      57319190
  TOTAL:                                                     80679670

The above shows that the system calls took 81 ms in total, 57 ms for the preads, and 23 ms for the opens.

top syscall overhead, now the same script is run for top,

# ./time.d top
dtrace: script './time.d' matched 458 probes
^C
[...]
  doorfs                                                        59614
  open64                                                       113472
  resolvepath                                                  123631
  stat64                                                       132480
  brk                                                          138305
  xstat                                                        147503
  memcntl                                                      318990
  pread64                                                      646356
  pollsys                                                      723641
  write                                                        795618
  ioctl                                                       1778097
  munmap                                                      4559626
  getdents64                                                  7034697
  close                                                      10806237
  mmap                                                       16958762
  read                                                       57622258
  open                                                      209996158
  TOTAL:                                                    312266191

In the above we can see 58 ms for the reads - a similar value to prstat, yet 210 ms for the opens. This is an extra 190 ms of system call overhead; 19 ms per screen update - around 2% of the CPU.

It should be noted that these millisecond values are relative. The server that these tests were perfomed on is an 867 MHz i86pc. On faster servers the millisecond values will be smaller, and on multi CPU servers the impact on the overall system will be smaller too. However, the reason for this extra overhead is the iteration over processes in /proc, so for servers running hundreds of processes the overhead will be greater (on this server only 95 processes were running).

A more relevant observation may be to say these extra opens are using around 60% of the overall CPU time for system calls; this value is more likely to be similar on servers with different CPUs.

5. Overall Timing

So far we have measured extra overhead by top making system calls. However not all of the CPU overhead is system calls. Here we accuratly measure the time each tool spends on the CPU, whether this is a system call or not,

prstat CPU usage, The following DTrace script is used to measure how the process uses the CPUs,

# cat total.d
#!/usr/sbin/dtrace -qs
dtrace:::BEGIN {
        total = 0;
}
sched:::on-cpu
/execname == $$1/
{
        self->start = vtimestamp;
}
sched:::off-cpu
/self->start/
{
        this->time = vtimestamp - self->start;
        total += this->time;
        @times = quantize(this->time);
        self->start = 0;
}
dtrace:::END {
        printf("Total Time on CPU: %d ns (%d ms)\n",total,total/1000000); 
        printa(@times);
}
#
# ./total.d prstat
^C
Total Time on CPU: 132799771 ns (132 ms)

           value  ------------- Distribution ------------- count    
            8192 |                                         0        
           16384 |@@                                       9        
           32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@          180      
           65536 |@@@@                                     21       
          131072 |@@                                       11       
          262144 |                                         0        
          524288 |                                         1        
         1048576 |                                         0        
         2097152 |                                         0        
         4194304 |@@                                       9        
         8388608 |@                                        4        
        16777216 |                                         1        
        33554432 |                                         0

So in total the prstat command spends 132 ms on the CPU. The above distribution plot tell us that most of the times prstat was on the CPU was for around 32 us (microseconds) each time - a short inteval.

top CPU usage, The same DTrace script is run for top,

# ./total.d top
^C
Total Time on CPU: 399404874 ns (399 ms)

           value  ------------- Distribution ------------- count    
            8192 |                                         0        
           16384 |@@@@@                                    7        
           32768 |@@@@                                     5        
           65536 |                                         0        
          131072 |@@@                                      4        
          262144 |@@@@                                     5        
          524288 |@@@                                      4        
         1048576 |@@@                                      4        
         2097152 |@                                        1        
         4194304 |@@                                       2        
         8388608 |@@@@@@@@@                                12       
        16777216 |@@@@                                     5        
        33554432 |@                                        1        
        67108864 |@                                        1        
       134217728 |                                         0

Top spends 399 ms in total on the CPU, quite often jumping onto the CPU for long periods (around 8 ms). We can already explain 190 ms of this due to the extra file opens, and another 30 ms or so due to extra related system calls. This leaves around 50 ms of extra CPU time by the top program unaccounted for. This would likely be due to differences in the code.

6. DTrace Tools

Rather than write the DTrace scripts above, the following demonstrates using the procsystime tool, found on the DTrace Tools website,

prstat procsystime and top, The following DTrace tool was run on both prstat and top to measure syscall details,

# procsystime -a -n prstat
Hit Ctrl-C to stop sampling...
^C

Elapsed Times for process prstat,
 
         SYSCALL          TIME (ns)
          fxstat              15933
[...]
             brk             511507
           ioctl             583028
           write            1503880
           close            1654823
           exece            2073095
            open            3218842
      getdents64            7339480
          doorfs           16485536
           pread           63929093
         pollsys         9028601228
 
CPU Overhead Times for process prstat,   
 
         SYSCALL          TIME (ns)
       sysconfig               2758
[...]
           ioctl             337347
         memcntl             429263
         pollsys             790727
           close            1056745
           write            1344893
           exece            2057874
            open            2473774
          doorfs            4130442
      getdents64            7176109
           pread           57809119
 
Syscall Counts for process prstat,
 
         SYSCALL              COUNT
           rexit                  1
[...]
           ioctl                 11
          llseek                 19
            mmap                 19
      getdents64                 20
           write                 22
             brk                 32
           close                104
            open                107
          doorfs                420
           pread                980

# procsystime -a -n top
Hit Ctrl-C to stop sampling...
^C

Elapsed Times for process top,
 
         SYSCALL          TIME (ns)
          getpid               7321
[...]
         pread64             828000
           write             859977
           ioctl            2029376
          munmap            4702569
      getdents64            7198911
            mmap           17038719
           close           17724485
            read           65008599
            open          217945945
         pollsys         9041990505
 
CPU Overhead Times for process top,      
 
         SYSCALL          TIME (ns)
          getgid                917
[...]
         pread64             652969
           write             782977
         pollsys             786389
           ioctl            1687791
          munmap            4620398
      getdents64            7051767
           close           11669988
            mmap           16809776
            read           58720211
            open          211224762
 
Syscall Counts for process top,
 
         SYSCALL              COUNT
           rexit                  1
[...]
       sysconfig                 13
          llseek                 19
      getdents64                 20
         pread64                 21
             brk                 24
            mmap                 31
           ioctl                 43
            read               1014
           close               1020
            open               1022

Much of the discoveries from the previous sections can be found in the above output.

7. Experimentation

As we now have seen that the number of system calls generated is related to the number of processes running, an experiment is performed to exaggerate the results. We run an extra 3,000 processes (/usr/bin/sleep) and then check how top and prstat are performing.

prstat with 3096 processes, the command output,

   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
  3929 brendan    18M   16M run      0    1  13:02:09  88% setiathome/1
 13457 root     6480K 6284K cpu0    45    0   0:00:07 9.3% prstat/1
  3597 brendan    79M   51M sleep   59    0   0:07:40 0.3% Xorg/1
  4019 brendan   108M 9312K sleep   49    0   0:01:07 0.1% soffice.bin/5
  5185 brendan  4120K 2248K sleep   49    0   0:00:06 0.1% xterm/1
  3909 brendan  6268K 2636K sleep   49    0   0:01:13 0.1% dtterm/1
    84 root     3084K 1632K sleep   59    0   0:00:05 0.0% nscd/24
  3642 brendan  3692K 1888K sleep   59    0   0:00:22 0.0% afterstep/1
   135 root     1852K  980K sleep   59    0   0:00:21 0.0% in.routed/1
     7 root       10M 3132K sleep   59    0   0:00:11 0.0% svc.startd/12
  3730 brendan    78M   25M sleep   49    0   0:05:29 0.0% mozilla-bin/3
  3648 brendan  3580K 1512K sleep   59    0   0:00:01 0.0% Pager/1
   354 root     5428K 1380K sleep   59    0   0:00:03 0.0% sendmail/1
  4016 brendan  2492K 1504K sleep   59    0   0:00:00 0.0% bash/1
   244 root     1048K  600K sleep   59    0   0:00:00 0.0% utmpd/1
   234 root     1788K  860K sleep   59    0   0:00:00 0.0% ttymon/1
   218 daemon   2040K  832K sleep   60  -20   0:00:00 0.0% lockd/2
   304 root       10M 1876K sleep   59    0   0:00:02 0.0% miniserv.pl/1
   110 daemon   3592K  828K sleep   59    0   0:00:00 0.0% kcfd/3
   224 root     1680K  788K sleep   59    0   0:00:00 0.0% sac/1
   214 daemon   2464K 1068K sleep   59    0   0:00:00 0.0% statd/1
Total: 3096 processes, 3203 lwps, load averages: 1.29, 7.14, 7.98

Ok, so prstat is actually consuming a noticable amount of the CPU, 9.3%.

top with 3096 processes, the command output,

last pid: 13454;  load averages:  5.24, 13.32,  9.79                   15:35:44
3096 processes:3091 sleeping, 1 running, 3 stopped, 1 on cpu
CPU states:  0.0% idle, 84.2% user, 15.8% kernel,  0.0% iowait,  0.0% swap
Memory: 512M real, 8108K free, 506M swap in use, 770M swap free

   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
  3929 brendan    1   7    1   18M   16M run    778:57 82.49% setiathome
 10448 root       1  42    0 5268K 2328K cpu      1:46 13.95% top
  3597 brendan    1  59    0   79M   51M sleep    7:39  0.74% Xorg
  3730 brendan    3  49    0   78M   26M sleep    5:29  0.54% mozilla-bin
  4019 brendan    5  49    0  108M 9492K sleep    1:07  0.08% soffice.bin
  3642 brendan    1  59    0 3692K 1904K sleep    0:22  0.08% afterstep
  3909 brendan    1  49    0 6260K 2416K sleep    1:13  0.03% dtterm
   135 root       1  59    0 1852K  984K sleep    0:21  0.03% in.routed
  5185 brendan    1  49    0 4120K 2196K sleep    0:06  0.03% xterm
     7 root      12  59    0   10M 3188K sleep    0:11  0.02% svc.startd
  3648 brendan    1  59    0 3580K 1512K sleep    0:01  0.02% Pager
    84 root      24  59    0 3084K 1644K sleep    0:05  0.01% nscd
     1 root       1  59    0 1992K  644K sleep    0:09  0.00% init
  5190 brendan    1  49    0 3624K 2088K sleep    0:06  0.00% xterm
   354 root       1  59    0 5428K 1380K sleep    0:03  0.00% sendmail

And top is using 14% of the CPU.

DTrace has already explained what is happening, this little experiment has shown that it can make a noticable on a system with many processes.

Conclusion

The following are some observations from the above DTrace measurements,

For each screen update top opens, reads and closes a psinfo file for every process. prstat only does a read.
The CPU time for these extra opens was a substantial component of the overall system call time - around 60%.
top is also performing a little extra user time than prstat.
top does indeed incur a heavier burden on the system than prstat, the measure of which depends on the system and number of processes.
The sysadmins most likely to have noticed a problem are those on systems with thousands of processes.

I hope this has been a useful demonstration of using DTrace from the command line, as well as from DTrace scripts.

Confessions

Well, I do have a confession to make. I've also written tools that use the /proc procfs data. Both prstat and top use this /proc data, prstat in a clever manner (preads), and top in a simple way (open, read, close). And when I've written tools, so far it's been in the same way as top! open, read, close, for every process.

As a programmer you try to think of every possible problem, especially bugs in the code that may cause it to crash or print incorrect data. A lot of effort goes into ensuring the program will work on every possible environment. And rightfully so, the program must be reliable! Less effort may go into benchmarking the program on every possible environment, and it's easy to not think of a particular scenario - like ten managers all running top on a system with thousands of processes.

So while top is more of a burden that prstat, it's not due to an obvious mistake or oversight, it's due to a programming style that many of us would have choosen - and only becomes a problem in certain scenarios.

More DTrace

DTrace Tools - DTrace tools; iosnoop, execsnoop, ...

DTrace - Sun's BigAdmin DTrace website.

Back to Brendan Gregg's Homepage