USE Method: Mac OS X Performance Checklist

This is my example USE Method-based performance checklist for the Apple Mac OS X operating system, for identifying common bottlenecks and errors. This draws upon both command line and graphical tools for coverage, focusing where possible on those that are provided with the OS by default, or by Apple (eg, Instruments). Further notes about tools are provided after this table.

Some of the metrics are easy to find in various GUIs or from the command line (eg, using Terminal; if you've never used Terminal before, follow my instructions at the top of this post). Many metrics require some math, inference, or quite a bit of digging. This will hopefully get easier in the future, as tools include a USE method wizard or the metrics required to follow this easily.

Physical Resources, Standard

component	type	metric
CPU	utilization	system-wide: `iostat 1`, "us" + "sy"; per-cpu: DTrace [1]; Activity Monitor → CPU Usage or Floating CPU Window; per-process: `top -o cpu`, "%CPU"; Activity Monitor → Activity Monitor, "%CPU"; per-kernel-thread: DTrace profile stack()
CPU	saturation	system-wide: `uptime`, "load averages" > CPU count; `latency`, "SCHEDULER" and "INTERRUPTS"; per-cpu: `dispqlen.d` (DTT), non-zero "value"; `runocc.d` (DTT), non-zero "%runocc"; per-process: Instruments → Thread States, "On run queue"; DTrace [2]
CPU	errors	`dmesg`; /var/log/system.log; Instruments → Counters, for PMC and whatever error counters are supported (eg, thermal throttling)
Memory capacity	utilization	system-wide: `vm_stat 1`, main memory free = "free" + "inactive", in units of pages; Activity Monitor → Activity Monitor → System Memory, "Free" for main memory; per-process: `top -o rsize`, "RSIZE" is resident main memory size, "VSIZE" is virtual memory size; `ps -alx`, "RSS" is resident set size, "SZ" is virtual memory size; `ps aux` similar (legacy format)
Memory capacity	saturation	system-wide: `vm_stat 1`, "pageout"; per-process: anonpgpid.d (DTT), DTrace vminfo:::anonpgin [3] (frequent anonpgin == pain); Instruments → Memory Monitor, high rate of "Page Ins" and "Page Outs"; `sysctl vm.memory_pressure` [4]
Memory capacity	errors	System Information → Hardware → Memory, "Status" for physical failures; DTrace failed malloc()s
Network Interfaces	utilization	system-wide: `netstat -i 1`, assume one very busy interface and use input/output "bytes" / known max (note: includes localhost traffic); per-interface: `netstat -I interface 1`, input/output "bytes" / known max; Activity Monitor → Activity Monitor → Network, "Data received/sec" "Data sent/sec" / known max (note: includes localhost traffic); atMonitor, interface percent
Network Interfaces	saturation	system-wide: `netstat -s`, for saturation related metrics, eg `netstat -s \| egrep 'retrans\|overflow\|full\|out of space\|no bufs'`; per-interface: DTrace
Network Interfaces	errors	system-wide: `netstat -s \| grep bad`, for various metrics; per-interface: `netstat -i`, "Ierrs", "Oerrs" (eg, late collisions), "Colls" [5]
Storage device I/O	utilization	system-wide: `iostat 1`, "KB/t" and "tps" are rough usage stats [6]; DTrace could be used to calculate a percent busy, using io provider probes; atMonitor, "disk0" is percent busy; per-process: iosnoop (DTT), shows usage; iotop (DTT), has -P for percent I/O
Storage device I/O	saturation	system-wide: iopending (DTT)
Storage device I/O	errors	DTrace io:::done probe when /args[0]->b_error == 0/
Storage capacity	utilization	file systems: `df -h`; swap: `sysctl vm.swapusage`, for swap file usage; Activity Monitor → Activity Monitor → System Memory, "Swap used"
Storage capacity	saturation	not sure this one makes sense - once its full, ENOSPC
Storage capacity	errors	DTrace; /var/log/system.log file system full messages

[1] eg: dtrace -x aggsortkey -n 'profile-100 /!(curthread->state & 0x80)/ { @ = lquantize(cpu, 0, 1000, 1); } tick-1s { printa(@); clear(@); }'. Josh Clulow also wrote a simple C program to dig out per-CPU utilization: cpu_usage.c.
[2] Until there are sched:::enqueue/dequeue probes, I suspect this could be done using fbt tracing of thread_*(). I haven't tried yet. It might be worth seeing what Instruments uses for its "On run queue" thread state trace, and DTracing that.
[3] eg: dtrace -n 'vminfo:::anonpgin { printf("%Y %s", walltimestamp, execname); }'.
[4] the kernel source under bsd/vm/vm_unix.c describes this as "Memory pressure indicator", although I've yet to see this as non-zero.
[5] the netstat(1) man page reads: "BUGS: The notion of errors is ill-defined."
[6] it would be great if Mac OS X iostat added a -x option to include utilization, saturation, and error columns, like Solaris "iostat -xnze 1".
atMonitor is a 3rd party tool that provides various statistics; I'm running version 2.7b, although it crashes if you leave the "Top Window" open for more than 2 seconds.
Activity Monitor is a default Apple performance monitoring tool with a graphical interface.
Instruments is an Apple performance analysis product with a graphical interface. It is comprehensive, consuming performance data from multiple frameworks, including DTrace. Instruments also includes functionality that was provided by separate previous performance analysis products, like CHUD and Shark, making it a one stop shop. It'd be wonderful if it included latency heat maps as well :-).
Temperature Monitor: 3rd party software that can read various temperature probes.
PMC == Performance Monitor Counters, aka CPU Performance Counters (CPC), Performance Instrumentation Counters (PICs), and more. These are processor hardware counters that are read via programmable registers on each CPU.
DTT == DTraceToolkit scripts, many of which were ported by the Apple engineers and shipped by default with Mac OS X. ie, you should be able to run these immediately, eg, sudo runocc.d.

Physical Resources, Advanced

component	type	metric
GPU	utilization	directly: DTrace [7]; atMonitor, "gpu"; indirect: Temperature Monitor; atMonitor, "gput"
GPU	saturation	DTrace [7]; Instruments → OpenGL Driver, "Client GLWait Time" (maybe)
GPU	errors	DTrace [7]
Storage controller	utilization	`iostat 1`, compare to known IOPS/tput limits per-card
Storage controller	saturation	DTrace and look for kernel queueing
Storage controller	errors	DTrace the driver
Network controller	utilization	system-wide: `netstat -i 1`, assume one busy controller and examine input/output "bytes" / known max (note: includes localhost traffic)
Network controller	saturation	see network interface saturation
Network controller	errors	see network interface errors
CPU interconnect	utilization	for multi-processor systems, try Instruments → Counters, and relevent PMCs for CPU interconnect port I/O, and measure throughput / max
CPU interconnect	saturation	Instruments → Counters, and relevent PMCs for stall cycles
CPU interconnect	errors	Instruments → Counters, and relevent PMCs for whatever is available
Memory interconnect	utilization	Instruments → Counters, and relevent PMCs for memory bus throughput / max, or, measure CPI and treat, say, 5+ as high utilization; Shark had "Processor bandwidth analysis" as a feature, which either was or included memory bus throughput, but I never used it
Memory interconnect	saturation	Instruments → Counters, and relevent PMCs for stall cycles
Memory interconnect	errors	Instruments → Counters, and relevent PMCs for whatever is available
I/O interconnect	utilization	Instruments → Counters, and relevent PMCs for tput / max if available; inference via known tput from iostat/...
I/O interconnect	saturation	Instruments → Counters, and relevent PMCs for stall cycles
I/O interconnect	errors	Instruments → Counters, and relevent PMCs for whatever is available

[7] I haven't found a shipped tool to provide GPU statistics easily. I'd like a gpustat that behaved like mpstat, with at least the columns: utilization, saturation, errors. Until there is such a tool, you could trace GPU activity (at least the scheduling of activity) using DTrace on the graphics drivers. It won't be easy. I imagine Instruments will at some point add a GPU instrument set (other than the OpenGL instruments), otherwise, 3rd party tools can be used, like atMonitor.
CPI == Cycles Per Instruction (others use IPC == Instructions Per Cycle).
I/O interconnect: this includes the CPU to I/O controller busses, the I/O controller(s), and device busses (eg, PCIe).
Using PMCs is typically a lot of work. This involves researching the processor manuals to see what counters are available and what they mean, and then collecting and interpreting them. I've used them on other OSes, but haven't used them all under Instruments → Counters, so I don't know if there's a hitch with anything there. Good luck.

Software Resources

component	type	metric
Kernel mutex	utilization	DTrace and lockstat provider for held times
Kernel mutex	saturation	DTrace and lockstat provider for contention times [8]
Kernel mutex	errors	DTrace and fbt provider for return probes and error status
User mutex	utilization	`plockstat -H` (held time); DTrace plockstat provider
User mutex	saturation	`plockstat -C` (contention); DTrace plockstat provider
User mutex	errors	DTrace plockstat and pid providers, for EDEADLK, EINVAL, ... see pthread_mutex_lock(3C)
Process capacity	utilization	current/max using: `ps -e \| wc -l` / `sysctl kern.maxproc`; `top`, "Processes:" also shows current
Process capacity	saturation	not sure this makes sense
Process capacity	errors	"can't fork()" messages
File descriptors	utilization	system-wide: `sysctl kern.num_files` / `sysctl kern.maxfiles`; per-process: can figure out using `lsof` and `ulimit -n`
File descriptors	saturation	I don't think this one makes sense, as if it can't allocate or expand the array, it errors; see fdalloc()
File descriptors	errors	`dtruss` or custom DTrace to look for errno == EMFILE on syscalls returning fds (eg, open(), accept(), ...)

[8] eg, showing adaptive lock block time totals (in nanoseconds) by calling function name: dtrace -n 'lockstat:::adaptive-block { @[caller] = sum(arg1); } END { printa("%40a%@16d ns\n", @); }'

Other Tools

I didn't include fs_usage, sc_usage, sample, spindump, heap, vmmap, malloc_history, leaks, and other useful Mac OS X performance tools, as here I'm beginning with questions (the methodology) and only including tools that answer them. This is instead of the other way around: listing all the tools and trying to find a use for them. Those other tools are useful for other methodologies, which can be used after this one.

What's Next

See the USE Method for the follow-up methodologies after identifying a possible bottleneck. If you complete this checklist but still have a performance issue, move onto other methodologies: drill-down analysis and latency analysis.

For more performance analysis, also see my earlier post on Top 10 DTrace Scripts for Mac OS X.

Acknowledgements

Resources used:

Instruments User Guide and Instruments User Reference
Apple's Performance Tools summary
man pages
xnu source code (kernel)
Mac OS X Internals, by Amit Singh, and his online list of performance tools

Filling this this checklist has required a lot of research, testing and experimentation. Please reference back to this post if it helps you develop related material.

It's quite possible I've missed something or included the wrong metric somewhere (sorry); I'll update the post to fix these up as they are understood, and note at the top the update date.

Also see my USE method performance checklists for Solaris, SmartOS, Linux, and FreeBSD.