USE Method: Unix 7th Edition Performance Checklist
Out of curiosity, I've developed a USE Method-based performance checklist for Unix 7th Edition on a PDP-11/45, which I've been running via a PDP simulator. 7th Edition is from 1979, and was the first Unix with iostat(1M) and pstat(1M), enabling more serious performance analysis from shipped tools. Were I to write a checklist for earlier Unixes, it would contain many more "unknowns".
I've worked on various Unix derivatives over the years, and it's been interesting to study this earlier version and see so many familiar areas.
Example screenshots from various tools are shown at the end of this page.
Physical Resources
component | type | metric |
---|---|---|
CPU | utilization | system-wide: iostat 1, utilization is "user" + "nice" + "systm"; per-process: ps alx, "CPU" shows recent CPU usage (max 255), and "TIME" shows cumulative minutes:seconds of CPU time |
CPU | saturation | ps alx | awk '$2 == "R" { r++ } END { print r - 1 }', shows the number of runnable processes |
CPU | errors | console message if lucky, otherwise panic |
Memory capacity | utilization | system-wide: unknown [1]; per-type: unknown [2]; per-process: ps alx, "SZ" is the in-core (main memory) in blocks (512 bytes); pstat -p, "SIZE" is in-core size, in units of core clicks (64 bytes) and printed in octal! |
Memory capacity | saturation | system-wide: iostat 1, sustained "tpm" may be caused by swapping to disk; significant delays as processes wait for space to swap in |
Memory capacity | errors | malloc() returns 0; ENOMEM |
Disk I/O | utilization | system-wide: iostat -i 1, "IO active" plus "IO wait" percents; per-disk-controller: iostat -i 1, RF, RK, RP "active" percents; rough estimate using iostat 1, and "tpm" for transactions per minute on expected max; per-disk: listen to each rattle; unknown from Unix, unless only 1 disk per controller; per-process: unknown |
Disk I/O | saturation | unknown [3] |
Disk I/O | errors | might get a console message, eg, "err on dev", "ECC on dev" or "no space on dev", otherwise unknown [4] |
Tape I/O | utilization | look at tape drives and watch them spin [5] |
Tape I/O | saturation | not sure this one makes sense |
Tape I/O | errors | might get a console message, eg "err on dev" or "mt0 off line", otherwise unknown [4] |
Storage capacity | utilization | file systems: df, by default lists free space in blocks (512 bytes) for root and /usr [6]; per-user: quot, shows blocks by user; swap: unknown [7]; usage via pstat -p, "F" with the 010 flag ("swapped out") |
Storage capacity | saturation | once full, "no space on dev" messages to console |
Storage capacity | errors | file systems: once full, "no space on dev" messages to console; swap: "out of swap space" on console |
Unibus | utilization | set front panel data display to "BUS REGISTER" and watch the frequency of data display changes [8] |
Unibus | saturation | frequent flashing of the "PAUSE" front panel indicator light |
Unibus | errors | unknown |
- [1] Adding sizes in ps alx will provide a rough idea of memory utilization, compared to the max printed during boot by the "mem =" line (maxmem). A tool could be written in C to figure out free memory and utilization properly by opening /dev/kmem and walking coremap. It may be easier to modify the kernel to keep counters for this.
- [2] The PDP-11/45 supported both core memory and semiconductor memory, at the same time.
- [3] The disk queue length is known by the device driver, as b_active on the drive head of queue struct (dktab, hptab, ...). But this doesn't look exported to user-land in any way. It could be dug out using adb(1).
- [4] Storage controllers generally do have various error counters in their registers (eg, see struct device in usr/sys/dev/hp.c), and the kernel usually maintains an error count as b_errcnt in the drive head of queue struct. But these aren't exported to user-land, and would need to be read from /dev/kmem, eg, using adb(1), same as [2].
- [5] See the (modified) amusing photo from 1972 on the right, showing a TU56 tape drive in motion. Apart from visual queues, disk and tape devices may have given audible clues that they were utilized.
- [6] Via /dev/rp0 for root, and /dev/rp3 for /usr. These are hardcoded in /usr/src/cmd/df.c. You can also specify a list of devices as arguments to df.
- [7] It's in swapmap, so you could write a C /dev/kmem reader to print this out like for coremap, or add metrics to the kernel. The total size is in nswap blocks, which was defined by the nswap line in /usr/sys/conf/*conf when the kernel was built.
- [8] The PDP 11/40 has a "BUS" light, described in the processor handbook as "Lights when the UNIBUS is being used". Great, as utilization can be determined by often it is illuminated. But this is the PDP 11/45, which doesn't have the BUS light. Instead, it has a "PAUSE" light to indicate when the CPU is blocked on the Unibus or the operator has HALTed execution, and a "MASTER" light to show when the CPU controls the Unibus. I'm not sure utilization can be determined from these. Instead, you could set the data display register select knob to "BUS REGISTER", and look for the frequency of changes in the data display, which is a series of lights on the front panel. I haven't heard of this being done, but I have read about programming that display as a light chaser in the kernel idle loop!
- The Unibus is the shared system bus, and connects all devices to the central processor. The PDP-11/45 has two of them due to the solidstate memory, although they are usually connected. I did not include other internal busses for the PDP-11. You can include them if you like.
Software Resources
component | type | metric |
---|---|---|
Process capacity | utilization | system-wide: pstat -ap, count active processes, or, pstat -ap | awk '$2 == "processes" { p = $1 } END { printf "%d/%d\n", p, NR - 1 }'; per-user: ps, compare process count (lines) to user limit MAXUPRC (usually 25) |
Process capacity | saturation | high process counts mean either memory pressure and swapping, out of process slots, or at user limit (see errors) |
Process capacity | errors | Bourne shell will enter a try/pause loop [9]; The Thompson shell (shipped as osh, old-shell) says "try again" |
File descriptors | utilization | system-wide: pstat -f, "open files" shows the number (or count lines), compare with file table size NFILE (usually 175) |
File descriptors | saturation | I don't think this one makes sense, once full, see errors |
File descriptors | errors | once full, "no file" error (ENFILE). See falloc() |
- [9] See the TFORK code in usr/src/cmd/xec.c. Yes, this really is C code. See mac.h for Steve's ALGOL 68 #defines.
Other Tools
Also worth mentioning, Unix 7th Edition included time(1) to print real/user/sys times, and other options to pstat(1M) including pstat -u addr to print various proc details.
Tools like uptime and vmstat are conspicuously absent, since these didn't exist in 7th Edition. They were added by BSD.
Example Outputs
The following are various screenshots from the tools used in the previous checklist, and more.
Looking around:
# who root console Dec 31 23:57 brendan tty00 Jan 1 00:01 # date Thu Jan 1 03:51:34 EST 1970 # ls -l /etc/passwd -rw-r--r-- 1 root 170 Dec 31 19:41 /etc/passwd # cat /etc/passwd root:VwL97VCAx1Qhs:0:1::/: daemon:x:1:1::/: sys::2:2::/usr/sys: bin::3:3::/bin: uucp::4:4::/usr/lib/uucp:/usr/lib/uucico dmr::7:3::/usr/dmr: brendan::11:3::/usr/brendan:
I didn't run the uname and uptime commands, as they don't exist yet.
Disk free (blocks) using df(1M) and quot(1M):
# df /dev/rp0 878 /dev/rp3 51789 # quot /dev/rrp3: 14875 bin 2234 brendan 1208 sys 381 root 129 uucp 21 #43 4 dmr 1 daemon
Process status with ps(1):
# ps PID TTY TIME CMD 16 co 0:00 -sh 407 co 0:00 ps # ps alx F S UID PID PPID CPU PRI NICE ADDR SZ WCHAN TTY TIME CMD 3 S 0 0 0 0 0 20 2253 2 4412 ? 186:14 swapper 1 S 0 1 0 0 30 20 2423 8 46520 ? 0:00 /etc/init 1 S 0 16 1 0 30 20 2273 11 46554 co 0:00 -sh 1 S 11 17 1 0 30 20 2777 11 46610 00 0:00 -sh 1 S 0 12 1 0 40 20 3127 5 140000 ? 0:00 /etc/update 1 S 1 15 1 0 40 20 3207 10 140000 ? 0:00 /etc/cron 1 R 11 384 17 235 74 30 2517 5 00 7:09 ./burncpu 1 S 11 431 17 0 40 20 3327 8 140000 00 0:00 sleep 600 1 S 11 432 17 0 28 20 3422 21 5154 00 0:00 ed tm.c 1 R 0 433 16 4 50 20 5266 20 co 0:00 ps alx
Using pstat(1M) to dump the entire process table:
# pstat -ap 10 processes LOC S F PRI SIGNAL UID TIM CPU NI PGRP PID PPID ADDR SIZE WCHAN LINK TEXTP CLKT 46464 1 3 0 0 0 127 0 20 0 0 0 2253 20 4412 0 0 0 46520 1 1 30 0 0 127 0 20 0 1 0 2423 74 46520 0 56634 0 46554 1 1 30 0 0 127 0 20 16 16 1 2273 130 46554 0 56650 0 46610 1 1 30 0 11 127 0 20 17 17 1 2777 130 46610 0 56650 0 46644 1 1 40 0 0 127 0 20 0 12 1 3127 47 140000 46770 56664 12 46700 1 1 40 0 1 127 0 20 0 15 1 3207 120 140000 46644 56700 1 46734 3 1 74 0 11 127 229 30 17 384 17 2517 47 0 0 0 0 46770 1 1 40 0 11 69 0 20 17 431 17 3327 73 140000 0 57024 531 47024 1 1 28 0 11 64 0 20 17 432 17 3422 247 5154 0 57040 0 47060 3 1 50 0 0 0 1 20 16 435 16 5041 216 0 46734 57054 0 47114 0 0 50 0 11 0 0 20 0 0 0 400 0 0 0 0 0 47150 0 0 50 0 0 0 5 20 0 0 0131000 0 0 4 0 0 47204 0 0 20 0 0 0 4 20 0 0 0131000 0 0 3 0 0 47240 0 0 50 0 1 0 0 20 0 0 0 0 0 0 0 0 0 [...zero lines truncated...]
... and the open file table:
# pstat -f 9 open files LOC FLG CNT INO OFFS 06754 RW 8 011770 818 06764 RW 7 012102 170244 06774 R 1 012440 0 07004 R 1 012552 0 07014 R 1 013446 0 07024 R 1 011544 0 07034 R 1 011544 0 07044 R 1 013446 0 07054 R 1 014566 3564
... and various struct user details:
# pstat -u 2517 rsav 0 0 segflg, error 0, 0 uids 11,3,11,3 procp 046734 base, count, offset 0172 0 138 cdir 013110 dbuf burncpu dirp 0177444 dent 2014 burncpu pdir 0 dseg 020 0 0 0 0 0 0 0177647 0406 0 0 0 0 0 0 065416 file 07014 06754 06754 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 args 0177432 0 0177706 0 0 sizes 0 02 025 sep 0 qsav 02642 021722 ssav 0130 046610 sigs 0 0 01 01 0 0 0 0 0 0 0 0 0 0 0 0 0 times 1801 0 ctimes 0 0 ar0 0141772 intflg 0 ttyp 05154 ttydev 3,0 comm burncpu
Disk and CPU statistics with iostat(1M):
# iostat 1 RF RK RP PERCENT tpm msps mspt tpm msps mspt tpm msps mspt user nice systm idle 1057 0.0 0.0 0 0.0 0.0 1178 -0.6 0.7 0.16 4.99 0.34 94.51 0 0.0 0.0 0 0.0 0.0 0 0.0 0.0 0.00 97.83 2.17 0.00 106980 0.0 0.0 0 0.0 0.0130440 -0.6 0.6 15.00 35.00 50.00 0.00 56580 0.0 0.0 0 0.0 0.0 67680 -0.6 0.6 10.00 63.33 26.67 0.00 89374 0.0 0.0 0 0.0 0.0108209 -0.6 0.6 17.39 2.90 79.71 0.00 60916 0.0 0.0 0 0.0 0.0 74179 -0.6 0.6 14.91 36.84 48.25 0.00 15789 0.0 0.0 0 0.0 0.0 17937 -0.6 0.6 12.28 63.16 24.56 0.00 1020 0.0 0.0 0 0.0 0.0 1020 -0.6 0.6 0.00100.00 0.00 0.00 0 0.0 0.0 0 0.0 0.0 0 0.0 0.0 0.00100.00 0.00 0.00 0 0.0 0.0 0 0.0 0.0 0 0.0 0.0 0.00100.00 0.00 0.00 [...]
... and time percentages:
# iostat -i 1 [...] 0.00 idle 15.32 user 37.90 nice 46.77 system 0.00 IO wait 1.61 IO active 1.61 RF active 0.00 RK active 0.00 RP active [...]
What boot looks like:
sim> run 0 Boot : hp(0,0)unix mem = 176448 #
What panics look like:
ka6 = 2317 aps = 141670 pc = 10067 ps = 144124 trap type 0 panic: trap
Acknowledgements
Resources used:
- The Computer History Simulation Project, which provided the PDP-11/45 simulator, simh. To run multiple terminals (ttys), I used the DC11 driver, which involved recompiling the Unix kernel -- within the simulator -- to include it. I also had to edit the simh source, pdp11/pdp11_dc.c, and remove the DEV_DIS flag from the dci_dev and dco_dev declarations, so that they weren't always disabled. If there was a way to set this via the simh shell, I never found it.
- The 7th Edition UNIX Programmer's Manual, Volume 2B, especially the section on "Setting Up Unix" by Charles B. Haley and Dennis M. Ritchie.
- Enthusiasts howtos for setting up PDP-11/Unix on simh, such as here and here.
- A post on Unix V7 and the DZ11 emulator, which I never got to work (panic after boot), but was good practice at editing and recompiling the Unix kernel.
- Lions' Commentary on UNIX 6th Edition, with Source Code, John Lions.
- PDP 11/40 Processor Handbook
- PDP 11/45 Processor Handbook
- Unix source code.
- PDP 11/70 front panel photo by ToastyKen.
Note that I've never used or even seen a PDP in real life (they were before my time). For me, this is an historical fascination explored through old manuals and source code. This checklist is based on these materials, with testing and experimentation in a simulator (including trying to debug kernel panics with adb(1), as well as configurations problems with simh). Please reference back to this post if you find it useful, and please leave a comment if you can fill in more details.