Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

Linux.conf.au: BPF, Tracing and more

Video: https://www.youtube.com/watch?v=JRFNIKUROPE

Talk for linux.conf.au 2017 (LCA2017) by Brendan Gregg, about Linux enhanced BPF (eBPF).

Description: "A world of new capabilities is emerging for the Linux 4.x series, thanks to enhancements that have been included in Linux for to Berkeley Packet Filter (BPF): an in-kernel virtual machine that can execute user space-defined programs. It is finding uses for security auditing and enforcement, enhancing networking (including eXpress Data Path), and performance observability and troubleshooting. Many new open source tools that have been written in the past 12 months for performance analysis that use BPF. Tracing superpowers have finally arrived for Linux!

For its use with tracing, BPF provides the programmable capabilities to the existing tracing frameworks: kprobes, uprobes, and tracepoints. In particular, BPF allows timestamps to be recorded and compared from custom events, allowing latency to be studied in many new places: kernel and application internals. It also allows data to be efficiently summarized in-kernel, including as histograms. This has allowed dozens of new observability tools to be developed so far, including measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more.

This talk will summarize BPF capabilities and use cases so far, and then focus on its use to enhance Linux tracing, especially with the open source bcc collection. bcc includes BPF versions of old classics, and many new tools, including execsnoop, opensnoop, funcccount, ext4slower, and more (many of which I developed). Perhaps you'd like to develop new tools, or use the existing tools to find performance wins large and small, especially when instrumenting areas that previously had zero visibility. I'll also summarize how we intend to use these new capabilities to enhance systems analysis at Netflix."

next
prev
1/72
next
prev
2/72
next
prev
3/72
next
prev
4/72
next
prev
5/72
next
prev
6/72
next
prev
7/72
next
prev
8/72
next
prev
9/72
next
prev
10/72
next
prev
11/72
next
prev
12/72
next
prev
13/72
next
prev
14/72
next
prev
15/72
next
prev
16/72
next
prev
17/72
next
prev
18/72
next
prev
19/72
next
prev
20/72
next
prev
21/72
next
prev
22/72
next
prev
23/72
next
prev
24/72
next
prev
25/72
next
prev
26/72
next
prev
27/72
next
prev
28/72
next
prev
29/72
next
prev
30/72
next
prev
31/72
next
prev
32/72
next
prev
33/72
next
prev
34/72
next
prev
35/72
next
prev
36/72
next
prev
37/72
next
prev
38/72
next
prev
39/72
next
prev
40/72
next
prev
41/72
next
prev
42/72
next
prev
43/72
next
prev
44/72
next
prev
45/72
next
prev
46/72
next
prev
47/72
next
prev
48/72
next
prev
49/72
next
prev
50/72
next
prev
51/72
next
prev
52/72
next
prev
53/72
next
prev
54/72
next
prev
55/72
next
prev
56/72
next
prev
57/72
next
prev
58/72
next
prev
59/72
next
prev
60/72
next
prev
61/72
next
prev
62/72
next
prev
63/72
next
prev
64/72
next
prev
65/72
next
prev
66/72
next
prev
67/72
next
prev
68/72
next
prev
69/72
next
prev
70/72
next
prev
71/72
next
prev
72/72

PDF: LCA2017_BPF_tracing_and_more.pdf

Keywords (from pdftotext):

slide 1:
    BPF: Tracing and More
    Brendan Gregg
    Senior Performance Architect
    
slide 2:
slide 3:
slide 4:
    Ye Olde BPF
    Berkeley Packet Filter
    # tcpdump host 127.0.0.1 and port 22 -d
    (000) ldh
    [12]
    (001) jeq
    #0x800
    jt 2
    jf 18
    (002) ld
    [26]
    (003) jeq
    #0x7f000001
    jt 6
    jf 4
    (004) ld
    [30]
    (005) jeq
    #0x7f000001
    jt 6
    jf 18
    (006) ldb
    [23]
    (007) jeq
    #0x84
    jt 10
    jf 8
    (008) jeq
    #0x6
    jt 10
    jf 9
    (009) jeq
    #0x11
    jt 10
    jf 18
    (010) ldh
    [20]
    (011) jset
    #0x1fff
    jt 18
    jf 12
    (012) ldxb
    4*([14]&0xf)
    [...]
    Optimizes tcpdump filter performance
    An in-kernel sandboxed virtual machine
    
slide 5:
    Enhanced BPF
    also known as just "BPF"
    User-Defined BPF Programs
    Kernel
    SDN Configura9on
    Run9me
    Event Targets
    DDoS Mi9ga9on
    verifier
    sockets
    Intrusion Detec9on
    kprobes
    BPF
    Container Security
    Observability
    uprobes
    tracepoints
    BPF ac>gt;ons
    
slide 6:
    Demo
    
slide 7:
    XDP
    eXpress Data Path
    Applica9on
    Kernel
    fast
    drop
    BPF
    program
    receive
    TCP/IP
    stack
    forward
    Linux 4.8+
    Network Device Drivers
    
slide 8:
    Intrusion Detec>gt;on
    BPF Security Module
    24x7 Audi9ng Daemon
    event configura>gt;on
    BPF
    bytecode
    Kernel
    verifier
    low-frequency events
    new TCP sessions
    new UDP sessions
    BPF
    non-TCP/UDP events
    privilege escala>gt;on
    per-event
    log
    maps
    capability usage
    new processes
    
slide 9:
    Container Security
    Networking & security policy enforcement
    Container
    Kernel
    BPF
    Container
    BPF
    BPF
    Network Interface
    hUps://github.com/cilium/cilium
    
slide 10:
    Observability
    Performance Analysis & Debugging
    Observability Program
    instrumenta>gt;on configura>gt;on
    BPF
    bytecode
    output
    Kernel
    verifier
    tracepoints
    BPF
    per-event
    data
    sta>gt;s>gt;cs
    sta>gt;c tracing
    dynamic tracing
    kprobes
    maps
    uprobes
    hUps://github.com/iovisor/bcc
    
slide 11:
    Wielding Superpowers
    WHAT DYNAMIC TRACING CAN DO
    
slide 12:
    Previously
    • Metrics were vendor chosen, closed source, and incomplete
    • The art of inference & making do
    # ps alx
    F S UID
    3 S
    1 S
    1 S
    […]
    PID
    PPID CPU PRI NICE
    0 30
    0 30
    ADDR
    WCHAN TTY TIME CMD
    4412 ? 186:14 swapper
    46520 ?
    0:00 /etc/init
    46554 co 0:00 –sh
    
slide 13:
    Crystal Ball Observability
    Dynamic Tracing
    
slide 14:
    Linux Event Sources
    
slide 15:
    Event Tracing Efficiency
    Eg, tracing TCP retransmits
    Kernel
    Old way: packet capture
    tcpdump
    Analyzer
    1. read
    2. dump
    buffer
    1. read
    2. process
    3. print
    file system
    send
    receive
    disks
    New way: dynamic tracing
    Tracer
    1. configure
    2. read
    tcp_retransmit_skb()
    
slide 16:
    New CLI Tools
    # biolatency
    Tracing block device I/O... Hit Ctrl-C to end.
    usecs
    : count
    distribution
    4 ->gt; 7
    : 0
    8 ->gt; 15
    : 0
    16 ->gt; 31
    : 0
    32 ->gt; 63
    : 0
    64 ->gt; 127
    : 1
    128 ->gt; 255
    : 12
    |********
    256 ->gt; 511
    : 15
    |**********
    512 ->gt; 1023
    : 43
    |*******************************
    1024 ->gt; 2047
    : 52
    |**************************************|
    2048 ->gt; 4095
    : 47
    |**********************************
    4096 ->gt; 8191
    : 52
    |**************************************|
    8192 ->gt; 16383
    : 36
    |**************************
    16384 ->gt; 32767
    : 15
    |**********
    32768 ->gt; 65535
    : 2
    65536 ->gt; 131071
    : 2
    
slide 17:
    New Visualiza>gt;ons and GUIs
    
slide 18:
    Neelix Intended Usage
    Self-service UI:
    Flame Graphs
    Tracing Reports
    should be open sourced; you may also build/buy your own
    
slide 19:
    Conquer Performance
    Measure anything
    
slide 20:
    Introducing enhanced BPF
    BPF TRACING
    
slide 21:
    A Linux Tracing Timeline
    1990’s: Sta>gt;c tracers, prototype dynamic tracers
    2000: LTT + DProbes (dynamic tracing; not integrated)
    2004: kprobes (2.6.9)
    2005: DTrace (not Linux), SystemTap (out-of-tree)
    2008: lrace (2.6.27)
    2009: perf (2.6.31)
    2009: tracepoints (2.6.32)
    2010-2016: lrace & perf_events enhancements
    2014-2016: BPF patches
    also: LTTng, ktap, sysdig, ...
    
slide 22:
    BPF Enhancements by Linux Version
    3.18: bpf syscall
    3.19: sockets
    4.1: kprobes
    4.4: bpf_perf_event_output
    4.6: stack traces
    4.7: tracepoints
    4.9: profiling
    eg, Ubuntu:
    
slide 23:
    Enhanced
    BPF
    is in Linux
    
slide 24:
    BPF
    • aka eBPF == enhanced Berkeley Packet Filter
    – Lead developer: Alexei Starovoitov (Facebook)
    • Many uses
    – Virtual networking
    – Security
    – Programma>gt;c tracing
    • Different front-ends
    – C, perf, bcc, ply, …
    BPF mascot
    
slide 25:
    BPF for Tracing
    User Program
    Kernel
    1. generate
    verifier
    kprobes
    BPF bytecode
    perevent
    data
    BPF
    2. load
    tracepoints
    3. perf_output
    sta>gt;s>gt;cs
    uprobes
    3. async
    read
    maps
    
slide 26:
    Raw BPF
    samples/bpf/sock_example.c
    87 lines truncated
    
slide 27:
    C/BPF
    samples/bpf/tracex1_kern.c
    58 lines truncated
    
slide 28:
    bcc
    • BPF Compiler Collec>gt;on
    Tracing layers:
    – hUps://github.com/iovisor/bcc
    – Lead developer: Brenden Blanco
    bcc tool
    • Includes tracing tools
    • Front-ends:
    – Python
    – Lua
    – C++
    – C helper libraries
    – golang (gobpf)
    bcc tool
    bcc
    Python
    user
    kernel
    lua
    front-ends
    Kernel
    Events
    BPF
    
slide 29:
    bcc/BPF
    bcc examples/tracing/bitehist.py
    en9re program
    
slide 30:
    ply/BPF
    hUps://github.com/wkz/ply/blob/master/README.md
    en9re program
    
slide 31:
    The Tracing Landscape, Jan 2017
    (less brutal)
    (my opinion)
    dtrace4L.
    Ease of use
    sysdig
    (brutal)
    ktap
    perf
    LTTng
    lrace
    (alpha)
    ply/BPF
    (mature)
    stap
    bcc/BPF
    C/BPF
    Stage of
    Development
    Raw BPF
    Scope & Capability
    
slide 32:
    State of BPF, Jan 2017
    Dynamic tracing, kernel-level (BPF support for kprobes)
    Dynamic tracing, user-level (BPF support for uprobes)
    Sta>gt;c tracing, kernel-level (BPF support for tracepoints)
    Timed sampling events (BPF with perf_event_open)
    PMC events (BPF with perf_event_open)
    Filtering (via BPF programs)
    Debug output (bpf_trace_printk())
    Per-event output (bpf_perf_event_output())
    Basic variables (global & per-thread variables, via BPF maps)
    Associa>gt;ve arrays (via BPF maps)
    Frequency coun>gt;ng (via BPF maps)
    Histograms (power-of-2, linear, and custom, via BPF maps)
    Timestamps and >gt;me deltas (bpf_k>gt;me_get_() and BPF)
    Stack traces, kernel (BPF stackmap)
    Stack traces, user (BPF stackmap)
    Overwrite ring buffers
    String factory (stringmap)
    Op>gt;onal: bounded loops, gt;c tracing, user-level (USDT probes via uprobes)
    Sta>gt;c tracing, dynamic USDT (needs library support)
    Debug output (Python with BPF.trace_pipe() and
    BPF.trace_fields())
    Per-event output (BPF_PERF_OUTPUT macro and
    BPF.open_perf_buffer())
    Interval output (BPF.get_table() and table.clear())
    Histogram prin>gt;ng (table.print_log2_hist())
    C struct naviga>gt;on, kernel-level (maps to bpf_probe_read())
    Symbol resolu>gt;on, kernel-level (ksym(), ksymaddr())
    Symbol resolu>gt;on, user-level (usymaddr())
    BPF tracepoint support (via TRACEPOINT_PROBE)
    BPF stack trace support (incl. walk method for stack frames)
    Examples (under /examples)
    Many tools (/tools)
    Tutorials (/docs/tutorial*.md)
    Reference guide (/docs/reference_guide.md)
    Open issues: (hUps://github.com/iovisor/bcc/issues)
    
slide 33:
    For end-users
    HOW TO USE BCC/BPF
    
slide 34:
    Installa>gt;on
    hUps://github.com/iovisor/bcc/blob/master/INSTALL.md
    • eg, Ubuntu Xenial:
    # echo "deb [trusted=yes] https://repo.iovisor.org/apt/xenial xenial-nightly main" | \
    sudo tee /etc/apt/sources.list.d/iovisor.list
    # sudo apt-get update
    # sudo apt-get install bcc-tools
    – puts tools in /usr/share/bcc/tools, and tools/old for older kernels
    – 16.04 is good, 16.10 beUer: more tools work
    – bcc should also arrive as an official Ubuntu snap
    
slide 35:
    Linux Perf Analysis in 60s
    1. uptime
    2. dmesg | tail
    3. vmstat 1
    4. mpstat -P ALL 1
    5. pidstat 1
    6. iostat -xz 1
    7. free -m
    8. sar -n DEV 1
    9. sar -n TCP,ETCP 1
    10. top
    hUp://techblog.neelix.com/2015/11/linux-performance-analysis-in-60s.html
    
slide 36:
    perf-tools (lrace)
    
slide 37:
    bcc Tracing Tools
    
slide 38:
    bcc General Performance Checklist
    execsnoop
    opensnoop
    ext4slower (…)
    biolatency
    biosnoop
    cachestat
    tcpconnect
    tcpaccept
    tcpretrans
    gethostlatency
    runqlat
    profile
    
slide 39:
    1. execsnoop
    # execsnoop
    PCOMM
    bash
    preconv
    man
    man
    man
    nroff
    nroff
    groff
    groff
    […]
    PID
    RET ARGS
    0 /usr/bin/man ls
    0 /usr/bin/preconv -e UTF-8
    0 /usr/bin/tbl
    0 /usr/bin/nroff -mandoc -rLL=169n -rLT=169n -Tutf8
    0 /usr/bin/pager -s
    0 /usr/bin/locale charmap
    0 /usr/bin/groff -mtty-char -Tutf8 -mandoc -rLL=169n -rLT=169n
    0 /usr/bin/troff -mtty-char -mandoc -rLL=169n -rLT=169n -Tutf8
    0 /usr/bin/grotty
    
slide 40:
    2. opensnoop
    # opensnoop
    PID
    COMM
    27159 catalina.sh
    redis-server
    redis-server
    30668 sshd
    30668 sshd
    30668 sshd
    30668 sshd
    30668 sshd
    30668 sshd
    30668 sshd
    30668 sshd
    30668 sshd
    30668 sshd
    snmp-pass
    […]
    FD ERR PATH
    0 /apps/tomcat8/bin/setclasspath.sh
    0 /proc/4057/stat
    0 /proc/2360/stat
    0 /proc/sys/kernel/ngroups_max
    0 /etc/group
    0 /root/.ssh/authorized_keys
    0 /root/.ssh/authorized_keys
    2 /var/run/nologin
    2 /etc/nologin
    0 /etc/login.defs
    0 /etc/passwd
    0 /etc/shadow
    0 /etc/localtime
    0 /proc/cpuinfo
    
slide 41:
    3. ext4slower
    # ext4slower 1
    Tracing ext4 operations slower than 1 ms
    TIME
    COMM
    PID
    T BYTES
    OFF_KB
    06:49:17 bash
    R 128
    06:49:17 cksum
    R 39552
    06:49:17 cksum
    R 96
    06:49:17 cksum
    R 96
    06:49:17 cksum
    R 10320
    06:49:17 cksum
    R 65536
    06:49:17 cksum
    R 55400
    06:49:17 cksum
    R 36792
    06:49:17 cksum
    R 15008
    06:49:17 cksum
    R 6123
    06:49:17 cksum
    R 6280
    06:49:17 cksum
    R 27696
    06:49:17 cksum
    R 58080
    […]
    LAT(ms) FILENAME
    7.75 cksum
    1.34 [
    5.36 2to3-2.7
    14.94 2to3-3.4
    6.82 411toppm
    4.01 a2p
    8.77 ab
    16.34 aclocal-1.14
    19.31 acpi_listen
    17.23 add-apt-repository
    18.40 addpart
    2.16 addr2line
    10.11 ag
    also: btrfsslower, xfsslower, zfslower
    
slide 42:
    4. biolatency
    # biolatency -mT 1
    Tracing block device I/O... Hit Ctrl-C to end.
    06:20:16
    msecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    […]
    : count
    : 36
    : 1
    : 3
    : 17
    : 33
    : 7
    : 6
    distribution
    |**************************************|
    |***
    |*****************
    |**********************************
    |*******
    |******
    
slide 43:
    5. biosnoop
    # biosnoop
    TIME(s)
    […]
    COMM
    supervise
    supervise
    supervise
    supervise
    supervise
    supervise
    supervise
    supervise
    xfsaild/md0
    xfsaild/md0
    xfsaild/md0
    xfsaild/md0
    kworker/0:3
    supervise
    PID
    DISK
    xvda1
    xvda1
    xvda1
    xvda1
    xvda1
    xvda1
    xvda1
    xvda1
    xvdc
    xvdb
    xvdb
    xvdb
    xvdb
    xvda1
    SECTOR
    BYTES
    LAT(ms)
    
slide 44:
    6. cachestat
    # cachestat
    HITS
    MISSES
    […]
    DIRTIES
    READ_HIT% WRITE_HIT%
    80.4%
    19.6%
    96.2%
    3.7%
    89.6%
    10.4%
    100.0%
    0.0%
    100.0%
    0.0%
    55.2%
    4.5%
    100.0%
    0.0%
    99.9%
    0.0%
    100.0%
    0.0%
    100.0%
    0.0%
    BUFFERS_MB
    CACHED_MB
    
slide 45:
    7. tcpconnect
    # tcpconnect
    PID
    COMM
    IP SADDR
    DADDR
    DPORT
    25333 recordProgra 4 127.0.0.1
    25338 curl
    4 100.66.3.172
    25340 curl
    4 100.66.3.172
    25342 curl
    4 100.66.3.172
    25344 curl
    4 100.66.3.172
    25365 recordProgra 4 127.0.0.1
    26119 ssh
    6 ::1
    ::1
    25388 recordProgra 4 127.0.0.1
    25220 ssh
    6 fe80::8a3:9dff:fed5:6b19 fe80::8a3:9dff:fed5:6b19 22
    […]
    
slide 46:
    8. tcpaccept
    # tcpaccept
    PID
    COMM
    IP RADDR
    LADDR
    LPORT
    sshd
    4 11.16.213.254
    redis-server 4 127.0.0.1
    redis-server 4 127.0.0.1
    redis-server 4 127.0.0.1
    redis-server 4 127.0.0.1
    sshd
    6 ::1
    ::1
    redis-server 4 127.0.0.1
    redis-server 4 127.0.0.1
    sshd
    6 fe80::8a3:9dff:fed5:6b19 fe80::8a3:9dff:fed5:6b19 22
    redis-server 4 127.0.0.1
    […]
    
slide 47:
    9. tcpretrans
    # tcpretrans
    TIME
    PID
    01:55:05 0
    01:55:05 0
    01:55:17 0
    […]
    IP LADDR:LPORT
    4 10.153.223.157:22
    4 10.153.223.157:22
    4 10.153.223.157:22
    T>gt; RADDR:RPORT
    R>gt; 69.53.245.40:34619
    R>gt; 69.53.245.40:34619
    R>gt; 69.53.245.40:22957
    STATE
    ESTABLISHED
    ESTABLISHED
    ESTABLISHED
    
slide 48:
    10. gethostlatency
    # gethostlatency
    TIME
    PID
    COMM
    06:10:24 28011 wget
    06:10:28 28127 wget
    06:10:41 28404 wget
    06:10:48 28544 curl
    06:11:10 29054 curl
    06:11:16 29195 curl
    06:11:24 25313 wget
    06:11:25 29404 curl
    06:11:28 29475 curl
    […]
    LATms HOST
    90.00 www.iovisor.org
    0.00 www.iovisor.org
    9.00 www.netflix.com
    35.00 www.netflix.com.au
    31.00 www.plumgrid.com
    3.00 www.facebook.com
    3.00 www.usenix.org
    72.00 foo
    1.00 foo
    
slide 49:
    11. runqlat
    # runqlat -m 5
    Tracing run queue latency... Hit Ctrl-C to end.
    msecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    […]
    : count
    : 3818
    : 39
    : 39
    : 62
    : 2214
    : 226
    distribution
    |****************************************|
    |***********************
    |**
    
slide 50:
    12. profile
    # profile
    Sampling at 49 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
    […]
    ffffffff813d0af8 __clear_user
    ffffffff813d5277 iov_iter_zero
    ffffffff814ec5f2 read_iter_zero
    ffffffff8120be9d __vfs_read
    ffffffff8120c385 vfs_read
    ffffffff8120d786 sys_read
    ffffffff817cc076 entry_SYSCALL_64_fastpath
    00007fc5652ad9b0 read
    dd (25036)
    […]
    
slide 51:
    Other bcc Tracing Tools
    • Single-purpose
    – bitesize
    – capabile
    – memleak
    – ext4dist (btrfs, …)
    • Mul>gt; tools
    – funccount
    – argdist
    – trace
    – stackcount
    hUps://github.com/iovisor/bcc#tools
    
slide 52:
    trace
    • Trace custom events. Ad hoc analysis:
    # trace 'sys_read (arg3 >gt; 20000) "read %d bytes", arg3'
    TIME
    PID
    COMM
    FUNC
    05:18:23 4490
    sys_read
    read 1048576 bytes
    05:18:23 4490
    sys_read
    read 1048576 bytes
    05:18:23 4490
    sys_read
    read 1048576 bytes
    05:18:23 4490
    sys_read
    read 1048576 bytes
    by Sasha Goldshtein
    
slide 53:
    trace One-Liners
    trace –K blk_account_io_start
    Trace this kernel function, and print info with a kernel stack trace
    trace 'do_sys_open "%s", arg2'
    Trace the open syscall and print the filename being opened
    trace 'sys_read (arg3 >gt; 20000) "read %d bytes", arg3'
    Trace the read syscall and print a message for reads >gt;20000 bytes
    trace r::do_sys_return
    Trace the return from the open syscall
    trace 'c:open (arg2 == 42) "%s %d", arg1, arg2'
    Trace the open() call from libc only if the flags (arg2) argument is 42
    trace 'p:c:write (arg1 == 1) "writing %d bytes to STDOUT", arg3'
    Trace the write() call from libc to monitor writes to STDOUT
    trace 'r:c:malloc (retval) "allocated = %p", retval
    Trace returns from malloc and print non-NULL allocated buffers
    trace 't:block:block_rq_complete "sectors=%d", args->gt;nr_sector'
    Trace the block_rq_complete kernel tracepoint and print # of tx sectors
    trace 'u:pthread:pthread_create (arg4 != 0)'
    Trace the USDT probe pthread_create when its 4th argument is non-zero
    from: trace -h
    
slide 54:
    argdist
    # argdist -H 'p::tcp_cleanup_rbuf(struct sock *sk, int copied):int:copied'
    [15:34:45]
    copied
    : count
    distribution
    0 ->gt; 1
    : 15088
    |**********************************
    2 ->gt; 3
    : 0
    4 ->gt; 7
    : 0
    8 ->gt; 15
    : 0
    16 ->gt; 31
    : 0
    32 ->gt; 63
    : 0
    64 ->gt; 127
    : 4786
    |***********
    128 ->gt; 255
    : 1
    256 ->gt; 511
    : 1
    512 ->gt; 1023
    : 4
    1024 ->gt; 2047
    : 11
    2048 ->gt; 4095
    : 5
    4096 ->gt; 8191
    : 27
    8192 ->gt; 16383
    : 105
    16384 ->gt; 32767
    : 0
    32768 ->gt; 65535
    : 10086
    |***********************
    65536 ->gt; 131071
    : 60
    131072 ->gt; 262143
    : 17285
    |****************************************|
    [...]
    by Sasha Goldshtein
    
slide 55:
    argdist One-Liners
    argdist -H 'p::__kmalloc(u64 size):u64:size'
    Print a histogram of allocation sizes passed to kmalloc
    argdist -p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16'
    Print a frequency count of how many times process 1005 called malloc for 16 bytes
    argdist -C 'r:c:gets():char*:$retval#snooped strings'
    Snoop on all strings returned by gets()
    argdist -H 'r::__kmalloc(size_t size):u64:$latency/$entry(size)#ns per byte'
    Print a histogram of nanoseconds per byte from kmalloc allocations
    argdist -C 'p::__kmalloc(size_t size, gfp_t flags):size_t:size:flags&GFP_ATOMIC'
    Print frequency count of kmalloc allocation sizes that have GFP_ATOMIC
    argdist -p 1005 -C 'p:c:write(int fd):int:fd' -T 5
    Print frequency counts of how many times writes were issued to a particular file descriptor
    number, in process 1005, but only show the top 5 busiest fds
    argdist -p 1005 -H 'r:c:read()'
    Print a histogram of error codes returned by read() in process 1005
    argdist -C 'r::__vfs_read():u32:$PID:$latency >gt; 100000'
    Print frequency of reads by process where the latency was >gt;0.1ms
    from: argdist -h
    
slide 56:
    Coming to a GUI near you
    BCC/BPF VISUALIZATIONS
    
slide 57:
    Latency Heatmaps
    
slide 58:
    CPU + Off-CPU Flame Graphs
    • Can now be
    BPF op>gt;mized
    hUp://www.brendangregg.com/flamegraphs.html
    
slide 59:
    Off-Wake Flame
    Graphs
    • Shows blocking stack with
    waker stack
    – BeUer understand why blocked
    – Merged in-kernel using BPF
    – Include mul>gt;ple waker stacks ==
    chain graphs
    • We couldn't do this before
    
slide 60:
    Overview for tool developers
    HOW TO PROGRAM BCC/BPF
    
slide 61:
    Linux Event Sources
    BPF output
    Linux 4.4
    Linux 4.7
    BPF stacks
    Linux 4.6
    Linux 4.3
    Linux 4.1
    (version
    feature
    arrived)
    Linux 4.9
    Linux 4.9
    
slide 62:
    Methodology
    • Find/draw a func>gt;onal diagram
    – Eg, storage I/O subsystem:
    • Apply performance methods
    hUp://www.brendangregg.com/methodology.html
    1. Workload Characteriza>gt;on
    2. Latency Analysis
    3. USE Method
    • Start with the Q's,
    then find the A's
    
slide 63:
    bitehist.py Output
    # ./bitehist.py
    Tracing... Hit Ctrl-C to end.
    kbytes
    : count
    0 ->gt; 1
    : 3
    2 ->gt; 3
    : 0
    4 ->gt; 7
    : 211
    8 ->gt; 15
    : 0
    16 ->gt; 31
    : 0
    32 ->gt; 63
    : 0
    64 ->gt; 127
    : 1
    128 ->gt; 255
    : 800
    distribution
    |**********
    |**************************************|
    
slide 64:
    bitehist.py Code
    bcc examples/tracing/bitehist.py
    
slide 65:
    bytehist.py Internals
    User-Level
    Kernel
    C BPF Program
    compile
    BPF Bytecode
    Event
    BPF.aUach_kprobe()
    Verifier
    BPF Bytecode
    error
    Sta>gt;s>gt;cs
    Python Program
    print
    async read
    Map
    
slide 66:
    bytehist.py Annotated
    Map
    C BPF Program
    Python Program
    Event
    "kprobe__" is a shortcut for BPF.aUach_kprobe()
    Sta>gt;s>gt;cs
    bcc examples/tracing/bitehist.py
    
slide 67:
    Current Complica>gt;ons
    Ini>gt;alize all variables
    Extra bpf_probe_read()s
    BPF_PERF_OUTPUT()
    Verifier errors
    
slide 68:
    bcc Tutorials
    1. hUps://github.com/iovisor/bcc/blob/master/INSTALL.md
    2. …/docs/tutorial.md
    3. …/docs/tutorial_bcc_python_developer.md
    4. …/docs/reference_guide.md
    5. .../CONTRIBUTING-SCRIPTS.md
    
slide 69:
    bcc lua
    bcc examples/lua/strlen_count.lua
    
slide 70:
    Summary
    Future Work
    • More tooling
    • Bug fixes
    • BeUer errors
    • Visualiza>gt;ons
    • GUIs
    • High-level
    language
    BPF Tracing in Linux
    • 3.19: sockets
    • 3.19: maps
    • 4.1: kprobes
    • 4.3: uprobes
    • 4.4: BPF output
    • 4.6: stacks
    • 4.7: tracepoints
    • 4.9: profiling
    • 4.9: PMCs
    hUps://github.com/iovisor/bcc#tools
    
slide 71:
    Links & References
    iovisor bcc:
    • hUps://github.com/iovisor/bcc hUps://github.com/iovisor/bcc/tree/master/docs
    • hUp://www.brendangregg.com/blog/ (search for "bcc")
    • hUp://blogs.microsol.co.il/sasha/2016/02/14/two-new-ebpf-tools-memleak-and-argdist/
    • I'll change your view of Linux tracing: hUps://www.youtube.com/watch?v=GsMs3n8CB6g
    • On designing tracing tools: hUps://www.youtube.com/watch?v=uibLwoVKjec
    BPF:
    • hUps://www.kernel.org/doc/Documenta>gt;on/networking/filter.txt
    • hUps://github.com/iovisor/bpf-docs
    • hUps://suchakra.wordpress.com/tag/bpf/
    Flame Graphs:
    • hUp://www.brendangregg.com/flamegraphs.html
    • hUp://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html
    • hUp://www.brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html
    Dynamic Instrumenta>gt;on:
    • hUp://lp.cs.wisc.edu/par-distr-sys/papers/Hollingsworth94Dynamic.pdf
    • hUps://en.wikipedia.org/wiki/DTrace
    • DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD, Brendan Gregg, Jim Mauro; Pren>gt;ce Hall 2011
    Neelix Tech Blog on Vector:
    • hUp://techblog.neelix.com/2015/04/introducing-vector-neelixs-on-host.html
    Linux Performance: hUp://www.brendangregg.com/linuxperf.html
    
slide 72:
    Thanks
    Ques>gt;ons?
    iovisor bcc: hUps://github.com/iovisor/bcc
    hUp://www.brendangregg.com
    hUp://slideshare.net/brendangregg
    [email protected]
    @brendangregg
    Thanks to Alexei Starovoitov (Facebook), Brenden Blanco
    (PLUMgrid/VMware), Sasha Goldshtein (Sela), Daniel Borkmann
    (Cisco), Wang Nan (Huawei), and other BPF and bcc
    contributors!