I originally posted this at http://dtrace.org/blogs/brendan/2012/12/13/usenix-lisa-2012-performance-analysis-methodology.
At USENIX LISA 2012, I gave a talk titled Performance Analysis Methodology. This covered ten performance analysis anti-methodologies and methodologies, including the USE Method. I wrote about these in the ACMQ article Thinking Methodically about Performance, which is worth reading for more detail. I've also posted USE Method-derived checklists for Solaris- and Linux-based systems.
The video of the talk is on the LISA site, and the slides are below, also available as a PDF.
I've summarized the methodologies in the talk below.
Methodology Summaries
Blame-Someone-Else Anti-Method:
- Find a system or environment component you are not responsible for
- Hypothesize that the issue is with that component
- Redirect the issue to the responsible team
- When proven wrong, go to 1
Streetlight Anti-Method:
- Pick observability tools that are
- Run tools
- Look for obvious issues
-
familiar
found on the Internet
found at random
Ad Hoc Checklist Method:
- ..N. Run A, if B, do C
Problem Statement Method:
- What makes you think there is a performance problem?
- Has this system ever performed well?
- What has changed recently? (Software? Hardware? Load?)
- Can the performance degradation be expressed in terms of latency or run time?
- Does the problem affect other people or applications
(or is it just you)?
- What is the environment? What software and hardware is used? Versions? Configuration?
Scientific Method:
- Question
- Hypothesis
- Prediction
- Test
- Analysis
Workload Characterization Method:
- Who is causing the load? PID, UID, IP addr, ...
- Why is the load called? code path
- What is the load? IOPS, tput, type
- How is the load changing over time?
Drill-Down Analysis Method:
- Start at highest level
- Examine next-level details
- Pick most interesting breakdown
- If problem unsolved, go to 2
Latency Analysis Method:
- Measure operation time (latency)
- Divide into logical synchronous components
- Continue division until latency origin is identified
- Quantify: estimate speedup if problem fixed
USE Method:
For every resource, check:
- Utilization
- Saturation
- Errors
Stack Profile Method:
- Profile thread stack traces (on- and off-CPU)
- Coalesce
- Study stacks bottom-up