USE Method: FreeBSD Performance Checklist

This page contains an example USE Method-based performance checklist for FreeBSD, for identifying common bottlenecks and errors. This is intended to be used early in a performance investigation, before moving onto more time consuming methodologies. This should be helpful for anyone using FreeBSD, especially system administrators.

This was developed on FreeBSD 10.0 alpha, and focuses on tools shipped by default. With DTrace, I was able to create a few new one-liners to answer some metrics. See the notes below the tables.

Physical Resources

component	type	metric
CPU	utilization	system-wide: `vmstat 1`, "us" + "sy"; per-cpu: `vmstat -P`; per-process: `top`, "WCPU" for weighted and recent usage; per-kernel-process: `top -S`, "WCPU"
CPU	saturation	system-wide: `uptime`, "load averages" > CPU count; `vmstat 1`, "procs:r" > CPU count; per-cpu: DTrace to profile CPU run queue lengths [1]; per-process: DTrace of scheduler events [2]
CPU	errors	`dmesg`; /var/log/messages; `pmcstat` for PMC and whatever error counters are supported (eg, thermal throttling)
Memory capacity	utilization	system-wide: `vmstat 1`, "fre" is main memory free; `top`, "Mem:"; per-process: `top -o res`, "RES" is resident main memory size, "SIZE" is virtual memory size; `ps -auxw`, "RSS" is resident set size (Kbytes), "VSZ" is virtual memory size (Kbytes)
Memory capacity	saturation	system-wide: `vmstat 1`, "sr" for scan rate, "w" for swapped threads (was saturated, may not be now); `swapinfo`, "Capacity" also for evidence of swapping/paging; per-process: DTrace [3]
Memory capacity	errors	physical: `dmesg`?; /var/log/messages?; virtual: DTrace failed malloc()s
Network Interfaces	utilization	system-wide: `netstat -i 1`, assume one very busy interface and use input/output "bytes" / known max (note: includes localhost traffic); per-interface: `netstat -I interface 1`, input/output "bytes" / known max
Network Interfaces	saturation	system-wide: `netstat -s`, for saturation related metrics, eg `netstat -s \| egrep 'retrans\|drop\|out-of-order\|memory problems\|overflow'`; per-interface: DTrace
Network Interfaces	errors	system-wide: `netstat -s \| egrep 'bad\|checksum'`, for various metrics; per-interface: `netstat -i`, "Ierrs", "Oerrs" (eg, late collisions), "Colls" [5]
Storage device I/O	utilization	system-wide: `iostat -xz 1`, "%b"; per-process: DTrace io provider, eg, iosnoop or iotop (DTT, needs porting)
Storage device I/O	saturation	system-wide: `iostat -xz 1`, "qlen"; DTrace for queue duration or length [4]
Storage device I/O	errors	DTrace io:::done probe when /args[0]->b_error != 0/
Storage capacity	utilization	file systems: `df -h`, "Capacity"; swap: `swapinfo`, "Capacity"; `pstat -T`, also shows swap space;
Storage capacity	saturation	not sure this one makes sense - once its full, ENOSPC
Storage capacity	errors	DTrace; /var/log/messages file system full messages
Storage controller	utilization	`iostat -xz 1`, sum IOPS & tput metrics for devices on the same controller, and compare to known limits [5]
Storage controller	saturation	check utilization and DTrace and look for kernel queueing
Storage controller	errors	DTrace the driver
Network controller	utilization	system-wide: `netstat -i 1`, assume one busy controller and examine input/output "bytes" / known max (note: includes localhost traffic)
Network controller	saturation	see network interface saturation
Network controller	errors	see network interface errors
CPU interconnect	utilization	`pmcstat` (PMC) for CPU interconnect ports, tput / max
CPU interconnect	saturation	`pmcstat` and relevant PMCs for CPU interconnect stall cycles
CPU interconnect	errors	`pmcstat` and relevant PMCs for whatever is available
Memory interconnect	utilization	`pmcstat` and relevant PMCs for memory bus throughput / max, or, measure CPI and treat, say, 5+ as high utilization
Memory interconnect	saturation	`pmcstat` and relevant PMCs for memory stall cycles
Memory interconnect	errors	`pmcstat` and relevant PMCs for whatever is available
I/O interconnect	utilization	`pmcstat` and relevant PMCs for tput / max if available; inference via known tput from iostat/netstat/...
I/O interconnect	saturation	`pmcstat` and relevant PMCs for I/O bus stall cycles
I/O interconnect	errors	`pmcstat` and relevant PMCs for whatever is available

[1] eg, using per-CPU run queue length as the saturation metric: dtrace -n 'profile-99 { @[cpu] = lquantize(`tdq_cpu[cpu].tdq_load, 0, 128, 1); } tick-1s { printa(@); trunc(@); }' where > 1 is saturation. If you're using the older BSD scheduler, profile runq_length[]. There are also the sched:::load-change and other sched probes.
[2] For this metric, lets use time spent in TDS_RUNQ as a per-thread saturation (latency) metric. Here is an (unstable) fbt-based one-liner: dtrace -n 'fbt::tdq_runq_add:entry { ts[arg1] = timestamp; } fbt::choosethread:return /ts[arg1]/ { @[stringof(args[1]->td_name), "runq (ns)"] = quantize(timestamp - ts[arg1]); ts[arg1] = 0; }'. This would be better (stability) if it can be rewritten to use the sched probes. It would also be great if there were simply high resolution thread state times in struct rusage or rusage_ext, eg, cumulative times for each state in td_state and more, which would make reading this metric easier and have lower overhead. See the Thread State Analysis Method from my Velocity talk for suggested states.
[3] eg, for swapping: dtrace -n 'fbt::cpu_thread_swapin:entry, fbt::cpu_thread_swapout:entry { @[probefunc, stringof(args[0]->td_name)] = count(); }' (NOTE, I would trace vm_thread_swapin() and vm_thread_swapout(), but their probes don't exist). Tracing paging is tricker until the vminfo provider is added; you could try tracing from swap_pager_putpages() and swap_pager_getpages(), but I didn't see an easy way to walk back to a thread struct; another approach may be via vm_fault_hold(). Good luck. See thread states [2], which could make this much easier.
[4] eg, sampling GEOM queue length at 99 Hertz: dtrace -n 'profile-99 { @["geom qlen"] = lquantize(`g_bio_run_down.bio_queue_length, 0, 256, 1); }'
[5] This approach is different from storage device (disk) utilization. For controllers, percent busy has much less meaning, so we're calculating utilization based on throughput (bytes/sec) and IOPS instead. Controllers typically have limits for these based on their busses and processing capacity. If you don't know them, you can determine them experimentally.
PMC == Performance Monitoring Counters, aka CPU Performance Counters (CPC), Performance Instrumentation Counters (PICs), and more. These are processor hardware counters that are read via programmable registers on each CPU. The availability of these counters is dependent on the processor type. See pmc(3) and pmcstat(8).
pmcstat(8): the FreeBSD tool for instrumenting PMCs. You might need to run a kldload hwpmc first before use. To figure out which PMCs you need to use and how, it usually takes some serious time (days) with the processor vendor manuals; eg, the Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3B, Appendix A-E: Performance-Monitoring Events.
DTT == DTraceToolkit scripts. These are in the FreeBSD source under cddl/contrib/dtracetoolkit, and dtruss is under /usr/sbin. As features are added to DTrace (see the freebsd-dtrace mailing list), more scripts can be ported.
CPI == Cycles Per Instruction (others use IPC == Instructions Per Cycle).
I/O interconnect: this includes the CPU to I/O controller busses, the I/O controller(s), and device busses (eg, PCIe).

Software Resources

component	type	metric
Kernel mutex	utilization	`lockstat -H` (held time); DTrace lockstat provider
Kernel mutex	saturation	`lockstat -C` (contention); DTrace lockstat provider [6]; spinning shows up with `dtrace -n 'profile-997 { @[stack()] = count(); }'`
Kernel mutex	errors	`lockstat -E` (errors); DTrace and fbt provider for return probes and error status
User mutex	utilization	DTrace pid provider for hold times; eg, pthread_mutex_*lock() return to pthread_mutex_unlock() entry
User mutex	saturation	DTrace pid provider for contention; eg, pthread_mutex_*lock() entry to return times
User mutex	errors	DTrace pid provider for EINVAL, EDEADLK, ... see pthread_mutex_lock(3C) etc.
Process capacity	utilization	current/max using: `ps -a \| wc -l` / `sysctl kern.maxproc`; `top`, "Processes:" also shows current
Process capacity	saturation	not sure this makes sense
Process capacity	errors	"can't fork()" messages
File descriptors	utilization	system-wide: `pstat -T`, "files"; `sysctl kern.openfiles` / `sysctl kern.maxfiles`; per-process: can figure out using `fstat -p PID` and `ulimit -n`
File descriptors	saturation	I don't think this one makes sense, as if it can't allocate or expand the array, it errors; see fdalloc()
File descriptors	errors	`truss`, `dtruss`, or custom DTrace to look for errno == EMFILE on syscalls returning fds (eg, open(), accept(), ...)

lockstat: you may need to run kldload ksyms before lockstat will work (otherwise: "lockstat: can't load kernel symbols: No such file or directory").
[6] eg, showing adaptive lock block time totals (in nanoseconds) by calling function name: dtrace -n 'lockstat:::adaptive-block { @[caller] = sum(arg1); } END { printa("%40a%@16d ns\n", @); }'

Other Tools

I didn't include procstat, sockstat, gstat or others, as here I'm beginning with questions (the methodology) and only including tools that answer them. This is instead of the other way around: listing all the tools and trying to find a use for them. Those other tools are useful for other methodologies, which can be used after this one.

What's Next

See the USE Method for the follow-up methodologies after identifying a possible bottleneck. If you complete this checklist but still have a performance issue, move onto other methodologies: drill-down analysis and latency analysis.

Acknowledgements

Resources used:

FreeBSD source code and man pages
FreeBSD Wiki PmcTools
FreeBSD Handbook
The Rosetta Stone for Unix is always handy, and also gave me the idea of adding some color backgrounds. Like it?

Filling this this checklist has required a lot of research, testing and experimentation. Please reference back to this post if it helps you develop related material.

It's quite possible I've missed something or included the wrong metric somewhere (sorry); I'll update the post to fix these up as they are understood, and note at the top the update date.

Also see my USE method performance checklists for Solaris, SmartOS, and Linux, and Mac OS X.