USENIX LISA 2012: Performance Analysis Methodology

13 Dec 2012

I originally posted this at http://dtrace.org/blogs/brendan/2012/12/13/usenix-lisa-2012-performance-analysis-methodology.

At USENIX LISA 2012, I gave a talk titled Performance Analysis Methodology. This covered ten performance analysis anti-methodologies and methodologies, including the USE Method. I wrote about these in the ACMQ article Thinking Methodically about Performance, which is worth reading for more detail. I've also posted USE Method-derived checklists for Solaris- and Linux-based systems.

The video of the talk is on the LISA site, and the slides are below, also available as a PDF.

I've summarized the methodologies in the talk below.

Methodology Summaries

Blame-Someone-Else Anti-Method:

Find a system or environment component you are not responsible for
Hypothesize that the issue is with that component
Redirect the issue to the responsible team
When proven wrong, go to 1

Streetlight Anti-Method:

Pick observability tools that are

familiar found on the Internet found at random

Run tools
Look for obvious issues

Ad Hoc Checklist Method:

..N. Run A, if B, do C

Problem Statement Method:

What makes you think there is a performance problem?
Has this system ever performed well?
What has changed recently? (Software? Hardware? Load?)
Can the performance degradation be expressed in terms of latency or run time?
Does the problem affect other people or applications (or is it just you)?
What is the environment? What software and hardware is used? Versions? Configuration?

Scientific Method:

Question
Hypothesis
Prediction
Test
Analysis

Workload Characterization Method:

Who is causing the load? PID, UID, IP addr, ...
Why is the load called? code path
What is the load? IOPS, tput, type
How is the load changing over time?

Drill-Down Analysis Method:

Start at highest level
Examine next-level details
Pick most interesting breakdown
If problem unsolved, go to 2

Latency Analysis Method:

Measure operation time (latency)
Divide into logical synchronous components
Continue division until latency origin is identified
Quantify: estimate speedup if problem fixed

USE Method:

For every resource, check:

Utilization
Saturation
Errors

Stack Profile Method:

Profile thread stack traces (on- and off-CPU)
Coalesce
Study stacks bottom-up

Brendan Gregg's Blog

USENIX LISA 2012: Performance Analysis Methodology

Methodology Summaries