I originally posted this at http://blogs.sun.com/brendan/entry/dtracing_vim_latency.
I've just decreased my vim start up time by 340x, which makes for an interesting example of using DTrace:
I noticed that the vim text editor was taking longer and longer to fire up, and it was now taking up to 14 seconds for vim to initialise. Something is definitely wrong, and there should be some big game for DTrace to hunt down.
Syscall analysis is a good starting point, and so I run procsystime from my DTraceToolkit to show syscall timing information and counts:
# /opt/DTT/procsystime -a -n vim Hit Ctrl-C to stop sampling... ^C Elapsed Times for processes vim, SYSCALL TIME (ns) sigaltstack 1866 sigpending 2046 [...] close 96958757 pollsys 1694151254 nanosleep 12098745228 TOTAL: 14219906941 CPU Times for processes vim, SYSCALL TIME (ns) sigaltstack 382 sigpending 420 [...] connect 2644571 xstat 3809476 open 9690091 TOTAL: 26863724 Syscall Counts for processes vim, SYSCALL COUNT access 1 chmod 1 [...] mmap 67 brk 188 xstat 213 TOTAL: 1041
This output has been cut to fit. Reading from bottom up:
- Syscall Counts show nothing obvious: 1041 syscalls isn't a lot. What is more interesting is their elapsed and on-CPU times. For example, those 213 xstat()s may have blocked on a slow NFS share.
- CPU Times in total are 26 ms – so they don't explain our long start up times.
- Elapsed Times show a total of 14 seconds, 12 seconds of which are during nanosleep. This corresponds to the long delay experienced as vim loaded.
Next is to examine why vim is calling nanosleep. We can start by pulling out user stack traces when nanosleep is called, either from a DTrace one-liner or using dtruss from the DTraceToolkit:
# dtrace -n 'syscall::nanosleep:entry /execname == "vim"/ { @[ustack()] = count(); }' dtrace: description 'syscall::nanosleep:entry ' matched 1 probe ^C libc.so.1`__nanosleep+0x7 libX11.so.4`_X11TransConnectDisplay+0x6b8 libX11.so.4`XOpenDisplay+0xbd libXt.so.4`XtOpenDisplay+0xa0 vim`setup_term_clip+0x233 vim`main+0xb5f vim`_start+0x80 12 # /opt/DTT/dtruss -st nanosleep -n vim PID/LWP SYSCALL(args) = return 849553/1: nanosleep(0x8046C50, 0x8046C58, 0xD9213FA8) = 0 0 libc.so.1`__nanosleep+0x7 libX11.so.4`_X11TransConnectDisplay+0x6b8 libX11.so.4`XOpenDisplay+0xbd libXt.so.4`XtOpenDisplay+0xa0 vim`setup_term_clip+0x233 vim`main+0xb5f vim`_start+0x80 [...]
This shows how nanosleep was being called by vim. It was being called from libX11 in functions such as _X11TransConnectDisplay() – huh? I'm using vim in text mode, why would it be talking to X11?
Well, if I have an X11 issue, let me start by checking the DISPLAY variable:
$ echo $DISPLAY :11.0
Ahh - that doesn't seem right. Maybe I'm connected via ssh, and we are doing X11 port forwarding?:
$ ptree $$ 369710 screen 369715 /bin/bash 851597 ptree 369715
Rather than showing sshd in our ancestory, this has shown screen. The screen software provides text based multiple session recovery. When I initialised screen, I did so via an ssh session. Now I'm connecting via the console, and that ssh session is long gone, as is the DISPLAY :11.0.
Setting DISPLAY to :0.0 fixes the issue, and now procsystime shows the following syscall elapsed times:
# /opt/DTT/procsystime -Te -n vim Hit Ctrl-C to stop sampling... ^C Elapsed Times for processes vim, SYSCALL TIME (ns) sigaltstack 1980 systeminfo 2221 [...] access 4190720 xstat 7960249 open 16273686 TOTAL: 42198388
A total elapsed time of 42 ms is 340 times faster than it was previously!
... I didn't answer why vim was talking to X11 in the first place: in the previous stack trace, the vim function that calls X is setup_term_clip(). There are numerous things we can do with DTrace to learn more about this, but first I'll check the vim source to see if a comment explains what is going on. This is what I found:
/* Start using the X clipboard, unless the GUI was started. */ [...] setup_term_clip()
Ahh, vim was trying to use the X clipboard, which was failing as our DISPLAY variable was corrupt.