USE Method: Solaris Performance Checklist
The USE Method provides a strategy for performing a complete check of system health, identifying common bottlenecks and errors. For each system resource, metrics for utilization, saturation and errors are identified and checked. Any issues discovered are then investigated using further strategies.
This is an example USE-based metric list for the Solaris family of operating systems. I'm writing this for later Solaris 10, Oracle Solaris 11, and illumos-based systems: SmartOS, OmniOS. This is primarily intended for system administrators of the physical systems (not tenants of cloud or zone instances; for those users, see my SmartOS performance checklist).
Physical Resources
component | type | metric |
---|---|---|
CPU | utilization | per-cpu: mpstat 1, "usr" + "sys"; system-wide: vmstat 1, "us" + "sy"; per-process: prstat -c 1 ("CPU" == recent), prstat -mLc 1 ("USR" + "SYS"); per-kernel-thread: lockstat -Ii rate, DTrace profile stack() |
CPU | saturation | system-wide: uptime, load averages; vmstat 1, "r"; DTrace dispqlen.d (DTT) for a better "vmstat r"; per-process: prstat -mLc 1, "LAT" |
CPU | errors | fmadm faulty; cpustat (CPC) for whatever error counters are supported (eg, thermal throttling) |
Memory capacity | utilization | system-wide: vmstat 1, "free" (main memory), "swap" (virtual memory); per-process: prstat -c, "RSS" (main memory), "SIZE" (virtual memory) |
Memory capacity | saturation | system-wide: vmstat 1, "sr" (bad now), "w" (was very bad); vmstat -p 1, "api" (anon page ins == pain), "apo"; per-process: prstat -mLc 1, "DFL"; DTrace anonpgpid.d (DTT), vminfo:::anonpgin on execname |
Memory capacity | errors | fmadm faulty and prtdiag for physical failures; fmstat -s -m cpumem-retire (ECC events); DTrace failed malloc()s |
Network Interfaces | utilization | nicstat (latest version here); kstat; dladm show-link -s -i 1 interface |
Network Interfaces | saturation | nicstat; kstat for whatever custom statistics are available (eg, "nocanputs", "defer", "norcvbuf", "noxmtbuf"); netstat -s, retransmits |
Network Interfaces | errors | netstat -i, error counters; dladm show-phys; kstat for extended errors, look in the interface and "link" statistics (there are often custom counters for the card); DTrace for driver internals |
Storage device I/O | utilization | system-wide: iostat -xnz 1, "%b"; per-process: DTrace iotop |
Storage device I/O | saturation | iostat -xnz 1, "wait"; DTrace iopending (DTT), sdqueue.d (DTB) |
Storage device I/O | errors | iostat -En; DTrace I/O subsystem, eg, ideerr.d (DTB), satareasons.d (DTB), scsireasons.d (DTB), sdretry.d (DTB) |
Storage capacity | utilization | swap: swap -s; file systems: df -h; plus other commands depending on FS type |
Storage capacity | saturation | not sure this one makes sense - once its full, ENOSPC |
Storage capacity | errors | DTrace; /var/adm/messages file system full messages |
Storage controller | utilization | iostat -Cxnz 1, compare to known IOPS/tput limits per-card |
Storage controller | saturation | look for kernel queueing: sd (iostat "wait" again), ZFS zio pipeline |
Storage controller | errors | DTrace the driver, eg, mptevents.d (DTB); /var/adm/messages |
Network controller | utilization | infer from nicstat and known controller max tput |
Network controller | saturation | see network interface saturation |
Network controller | errors | kstat for whatever is there / DTrace |
CPU interconnect | utilization | cpustat (CPC) for CPU interconnect ports, tput / max (eg, see the amd64htcpu script) |
CPU interconnect | saturation | cpustat (CPC) for stall cycles |
CPU interconnect | errors | cpustat (CPC) for whatever is available |
Memory interconnect | utilization | cpustat (CPC) for memory busses, tput / max; or CPI greater than, say, 5; CPC may also have local vs remote counters |
Memory interconnect | saturation | cpustat (CPC) for stall cycles |
Memory interconnect | errors | cpustat (CPC) for whatever is available |
I/O interconnect | utilization | busstat (SPARC only); cpustat for tput / max if available; inference via known tput from iostat/nicstat/... |
I/O interconnect | saturation | cpustat (CPC) for stall cycles |
I/O interconnect | errors | cpustat (CPC) for whatever is available |
- CPU utilization: a single hot CPU can be caused by a single hot thread, or mapped hardware interrupt. Relief of the bottleneck usually involves tuning to use more CPUs in parallel.
- lockstat and plockstat are DTrace-based since Solaris 10 FCS.
- vmstat "r": this is coarse as it is only updated once per second.
- CPC == CPU Performance Counters (aka "Performance Instrumentation Counters" (PICs), or "Performance Monitoring Events"), read via programmable registers on each CPU, by cpustat(1M) or the DTrace "cpc" provider. These have traditionally been hard to work with due to differences between CPUs, but are getting much easier with the PAPI standard. Still, expect to spend some quality time (days) with the processor vendor manuals (what "cpustat -h" tells you to read), and to post-process cpustat with awk or perl. See my short talk (video) about CPC (2010). (Many years ago, I made a toolkit including CPC scripts - CacheKit - that was too much work to maintain.)
- Memory capacity utilization: interpreting vmstat's "free" has been tricky across different Solaris versions (we documented it in the Perf & Tools book), due to different ways it was calculated, and tunables that affect when the system will kick-off the page scanner. It'll also typically shrink as the kernel uses unused memory for caching (ZFS ARC).
- Be aware that kstat can report bad data (so can any tool); there isn't really a test suite for kstat data, and engineers can add new code paths and forget to add the counters.
- DTT == DTraceToolkit scripts, DTB == DTrace book scripts.
- CPI == Cycles Per Instruction (others use IPC == Instructions Per Cycle).
- I/O interconnect: this includes the CPU to I/O controller busses, the I/O controller(s), and device busses (eg, PCIe).
Software Resources
component | type | metric |
---|---|---|
Kernel mutex | utilization | lockstat -H (held time); DTrace lockstat provider |
Kernel mutex | saturation | lockstat -C (contention); DTrace lockstat provider; spinning shows up with dtrace -n 'profile-997 { @[stack()] = count(); }' |
Kernel mutex | errors | lockstat -E, eg recusive mutex enter (other errors can cause kernel lockup/panic, debug with mdb -k) |
User mutex | utilization | plockstat -H (held time); DTrace plockstat provider |
User mutex | saturation | plockstat -C (contention); prstat -mLc 1, "LCK"; DTrace plockstat provider |
User mutex | errors | DTrace plockstat and pid providers, for EAGAIN, EINVAL, EPERM, EDEADLK, ENOMEM, EOWNERDEAD, ... see pthread_mutex_lock(3C) |
Process capacity | utilization | sar -v, "proc-sz"; kstat, "unix:0:var:v_proc" for max, "unix:0:system_misc:nproc" for current; DTrace (`nproc vs `max_nprocs) |
Process capacity | saturation | not sure this makes sense; you might get queueing on pidlinklock in pid_allocate(), as it scans for available slots once the table gets full |
Process capacity | errors | "can't fork()" messages |
Thread capacity | utilization | user-level: kstat, "unix:0:lwp_cache:buf_inuse" for current, prctl -n zone.max-lwps -i zone ZONE for max; kernel: mdb -k or DTrace, "nthread" for current, limited by memory |
Thread capacity | saturation | threads blocking on memory allocation; at this point the page scanner should be running (vmstat "sr"), else examine using DTrace/mdb. |
Thread capacity | errors | user-level: pthread_create() failures with EAGAIN, EINVAL, ...; kernel: thread_create() blocks for memory but won't fail. |
File descriptors | utilization | system-wide (no limit other than RAM); per-process: pfiles vs ulimit or prctl -t basic -n process.max-file-descriptor PID; a quicker check than pfiles is ls /proc/PID/fd | wc -l |
File descriptors | saturation | does this make sense? I don't think there is any queueing or blocking, other than on memory allocation. |
File descriptors | errors | truss or DTrace (better) to look for errno == EMFILE on syscalls returning fds (eg, open(), accept(), ...). |
- lockstat/plockstat often drop events due to load; I often roll my own to avoid this using the DTrace lockstat/plockstat provider (there are examples of this in the DTrace book).
- File descriptor utilization: while other OSes have a system-wide limit, Solaris doesn't (at least at the moment, this could change; see my writeup about it).
What's Next
See the USE Method for the follow-up strategies after identifying a possible bottleneck. If you complete this checklist but still have a performance issue, move onto other strategies: drill-down analysis and latency analysis.