I originally posted this at http://blogs.sun.com/brendan/entry/iscsi_dtrace_provider.
I recently tried out the new iSCSI Target Server DTrace provider which Adam wrote (which should be included in Solaris sometime soon). It provides observability of iSCSI traffic from the the server's perspective. This is great news if you have an iSCSI Solaris server, or are planning to have one.
In this entry I'll post a few screenshots of the iSCSI provider in action, both one-liners and scripts.
One-liners
I'll start by listing the probes – which may convey a sense of what is possible to trace. Look down the "NAME" column:
# dtrace -ln 'iscsi*:::' ID PROVIDER MODULE FUNCTION NAME 11696 iscsi100653 iscsitgtd send_async_logout async-send 11697 iscsi100653 iscsitgtd send_async_scsi async-send 11698 iscsi100653 iscsitgtd handle_scsi_data data-receive 11699 iscsi100653 iscsitgtd iscsi_conn_data_rqst data-request 11700 iscsi100653 iscsitgtd send_datain_pdu data-send 11701 iscsi100653 iscsitgtd iscsi_handle_login_pkt login-command 11702 iscsi100653 iscsitgtd send_login_reject login-response 11703 iscsi100653 iscsitgtd iscsi_handle_login_pkt login-response 11704 iscsi100653 iscsitgtd handle_logout_msg logout-command 11705 iscsi100653 iscsitgtd conn_state logout-response 11706 iscsi100653 iscsitgtd handle_noop_cmd nop-receive 11707 iscsi100653 iscsitgtd iscsi_conn_pkt nop-send 11708 iscsi100653 iscsitgtd handle_scsi_cmd scsi-command 11709 iscsi100653 iscsitgtd send_scsi_rsp scsi-response 11710 iscsi100653 iscsitgtd handle_task_mgt task-command 11711 iscsi100653 iscsitgtd iscsi_conn_pkt task-response 11712 iscsi100653 iscsitgtd handle_text_msg text-command 11713 iscsi100653 iscsitgtd handle_text_msg text-response
The easiest one-liner would be to frequency count iSCSI operations:
# dtrace -n 'iscsi*::: { @[probename] = count(); }' dtrace: description 'iscsi*::: ' matched 18 probes ^C nop-receive 3 nop-send 3 data-receive 55 data-request 55 scsi-response 103 scsi-command 342 data-send 2865
The above output shows that the iSCSI server is doing more send operations (2865 data-sends), than receive (55 data-receives).
Now to see who is using this iSCSI server, by frequency counting remote IP addresses:
# dtrace -n 'iscsi*::: { @[args[0]->ci_remote] = count(); }' dtrace: description 'iscsi*::: ' matched 18 probes ^C 192.168.1.109 1338
The above shows that 192.168.1.108 triggered 1338 iSCSI operations, which may have been sends, receives, nops, etc. Breaking down further is just a little more DTrace:
# dtrace -qn 'iscsi*::: { @[args[0]->ci_remote, probename] = count(); } END { printa("%32s %18s %@16d\\n", @); }' ^C 192.168.1.109 nop-receive 2 192.168.1.109 nop-send 2 192.168.1.109 data-receive 16 192.168.1.109 data-request 16 192.168.1.109 scsi-response 43 192.168.1.109 scsi-command 253 192.168.1.109 data-send 2619
Great, while a simple one-liner, it quickly identifies who is connecting and what operations they are doing. Some performance wins are about locating and eliminating unnecessary work, which one-liners such as this help identify.
Now for printing payload bytes by iSCSI operation:
# dtrace -n 'iscsi*::: { @[probename] = sum(args[1]->ii_datalen); }' dtrace: description 'iscsi*::: ' matched 18 probes ^C data-request 0 scsi-response 0 data-receive 99328 scsi-command 140800 data-send 20672512
The above one-liner prints total bytes by operation, which does lose some data. What was the count by operation, average, min, max? DTrace can answer all of these seperately, or provide a summary as a distribution plot. Here I've used quantize for a power of 2 plot:
# dtrace -n 'iscsi*::: { @[probename] = quantize(args[2]->ii_datalen); }' dtrace: description 'iscsi*::: ' matched 18 probes ^C data-request value ------------- Distribution ------------- count -1 | 0 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 12 1 | 0 scsi-response value ------------- Distribution ------------- count -1 | 0 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 55 1 | 0 data-receive value ------------- Distribution ------------- count 512 | 0 1024 |@@@@@@@ 2 2048 | 0 4096 |@@@@@@@@@@@@@ 4 8192 |@@@@@@@@@@@@@@@@@@@@ 6 16384 | 0 scsi-command value ------------- Distribution ------------- count -1 | 0 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 209 1 | 0 2 | 0 4 | 0 8 | 0 16 | 0 32 | 0 64 | 0 128 | 0 256 | 0 512 |@@@ 18 1024 |@@ 10 2048 |@@ 11 4096 |@ 4 8192 |@@ 12 16384 | 0 data-send value ------------- Distribution ------------- count 256 | 0 512 | 1 1024 | 16 2048 | 21 4096 | 28 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2559 16384 | 0
The plots above show that for the data-send operation, the payload size was usually in the 8 to 16 Kbyte bucket.
Scripts
Apart from one-lines, DTrace allows powerful scripts to be contstructed. I've put a few on the SolarisInternals DTrace Topics: iSCSI page.
Here I'll post my favourite so far, a script which measures read/write latency (or "response time") in nanoseconds:
# ./iscsirwlat.d Tracing... Hit Ctrl-C to end. ^C Read Latency (ns), 192.168.1.109 value ------------- Distribution ------------- count 32768 | 0 65536 |@@@ 27 131072 |@@ 17 262144 | 4 524288 |@@@@ 39 1048576 |@@@ 32 2097152 |@@@@@@@@@@@@@@@@@ 164 4194304 |@ 11 8388608 |@ 5 16777216 |@ 10 33554432 |@ 7 67108864 |@ 5 134217728 |@@@@@ 46 268435456 |@ 14 536870912 | 0 Write Latency (ns), 192.168.1.109 value ------------- Distribution ------------- count 32768 | 0 65536 |@@ 14 131072 |@@ 16 262144 |@@@ 20 524288 |@@@ 20 1048576 |@@@@@ 33 2097152 |@ 5 4194304 | 0 8388608 | 2 16777216 |@ 10 33554432 |@@@ 18 67108864 |@@@@@ 33 134217728 |@@@@@@@@@@@@@ 93 268435456 |@@ 12 536870912 | 0
The script is on the DTrace Topics page. It is a glimpse of things to come; how many of those were synchronous requests (which may hurt application performance) versus asynchronous flushing or read ahead (which may not hurt performance)? More DTrace is needed, which will probably include running DTrace on the client side as well...