Plotting your load test with JMeter

If you've ever used JMeter, you know it's an awesome load testing tool. It also comes with a built-in graph listener, which allows you to watch JMeter do, well... something.

JMeter graph

While this gives a basic view of response time and throughput, it doesn't show failures, nor how the server responds as load increases. And let's face it, it's just plain ugly.

Enter Matplotlib, a beautiful (though complex) plotting tool written in Python.

Box plots for response time are shown in green, throughput is in blue, and 50x errors are plotted as red X's. The script assumes a few things:

  • You have a series of CSV files sampled with different thread counts.
  • The input files are named N-blah-blah.csv, where N is the number of threads. The file names are taken as command-line arguments.
  • Your CSV report contains the follow fields at a minimum: label, elapsed, and timeStamp. The results are grouped by label (a name you assign to each JMeter sampler), so each sampler produces a separate plot.
  • And of course, that you have python and Matplotlib. If you are on OS X, the easiest way to install it is via MacPorts.

Stay tuned for the next article on the JMX file.

Sample plots

Click an image for a larger view.

Source code

#!/opt/local/bin/python2.6
 
from pylab import *
import numpy as na
import matplotlib.font_manager
import csv
import sys
 
elapsed = {}
timestamps = {}
starttimes = {}
errors = {}
 
# Parse the CSV files
for file in sys.argv[1:]:
  threads = int(file.split('-')[0])
  for row in csv.DictReader(open(file)):
    if (not row['label'] in elapsed):
      elapsed[row['label']] = {}
      timestamps[row['label']] = {}
      starttimes[row['label']] = {}
      errors[row['label']] = {}
    if (not threads in elapsed[row['label']]):
      elapsed[row['label']][threads] = []
      timestamps[row['label']][threads] = []
      starttimes[row['label']][threads] = []
      errors[row['label']][threads] = []
    elapsed[row['label']][threads].append(int(row['elapsed']))
    timestamps[row['label']][threads].append(int(row['timeStamp']))
    starttimes[row['label']][threads].append(int(row['timeStamp']) - int(row['elapsed']))
    if (row['success'] != 'true'):
      errors[row['label']][threads].append(int(row['elapsed']))
 
# Draw a separate figure for each label found in the results.
for label in elapsed:
  # Transform the lists for plotting
  plot_data = []
  throughput_data = [None]
  error_x = []
  error_y = []
  plot_labels = []
  column = 1
  for thread_count in sort(elapsed[label].keys()):
    plot_data.append(elapsed[label][thread_count])
    plot_labels.append(thread_count)
    test_start = min(starttimes[label][thread_count])
    test_end = max(timestamps[label][thread_count])
    test_length = (test_end - test_start) / 1000
    num_requests = len(timestamps[label][thread_count]) - len(errors[label][thread_count])
    if (test_length > 0):
      throughput_data.append(num_requests / float(test_length))
    else:
      throughput_data.append(0)
    for error in errors[label][thread_count]:
      error_x.append(column)
      error_y.append(error)
    column += 1
 
 
  # Start a new figure
  fig = figure(figsize=(9, 6))
 
  # Pick some colors
  palegreen = matplotlib.colors.colorConverter.to_rgb('#8CFF6F')
  paleblue = matplotlib.colors.colorConverter.to_rgb('#708DFF')
 
  # Plot response time
  ax1 = fig.add_subplot(111)
  ax1.set_yscale('log')
  bp = boxplot(plot_data, notch=0, sym='+', vert=1, whis=1.5)
 
  # Tweak colors on the boxplot
  plt.setp(bp['boxes'], color='g')
  plt.setp(bp['whiskers'], color='g')
  plt.setp(bp['medians'], color='black')
  plt.setp(bp['fliers'], color=palegreen, marker='+')
 
  # Now fill the boxes with desired colors
  numBoxes = len(plot_data)
  medians = range(numBoxes)
  for i in range(numBoxes):
    box = bp['boxes'][i]
    boxX = []
    boxY = []
    for j in range(5):
      boxX.append(box.get_xdata()[j])
      boxY.append(box.get_ydata()[j])
    boxCoords = zip(boxX,boxY)
    boxPolygon = Polygon(boxCoords, facecolor=palegreen)
    ax1.add_patch(boxPolygon)
 
  # Plot the errors
  if (len(error_x) > 0):
    ax1.scatter(error_x, error_y, color='r', marker='x', zorder=3)
 
  # Plot throughput
  ax2 = ax1.twinx()
  ax2.plot(throughput_data, 'o-', color=paleblue, linewidth=2, markersize=8)
 
  # Label the axis
  ax1.set_title(label)
  ax1.set_xlabel('Number of concurrent requests')
  ax2.set_ylabel('Requests per second')
  ax1.set_ylabel('Milliseconds')
  ax1.set_xticks(range(1, len(plot_labels) + 1, 2))
  ax1.set_xticklabels(plot_labels[0::2])
  fig.subplots_adjust(top=0.9, bottom=0.15, right=0.85, left=0.15)
 
  # Turn off scientific notation for Y axis
  ax1.yaxis.set_major_formatter(ScalarFormatter(False))
 
  # Set the lower y limit to the match the first column
  ax1.set_ylim(ymin=bp['boxes'][0].get_ydata()[0])
 
  # Draw some tick lines
  ax1.yaxis.grid(True, linestyle='-', which='major', color='grey')
  ax1.yaxis.grid(True, linestyle='-', which='minor', color='lightgrey')
  # Hide these grid behind plot objects
  ax1.set_axisbelow(True)
 
  # Add a legend
  line1 = Line2D([], [], marker='s', color=palegreen, markersize=10, linewidth=0)
  line2 = Line2D([], [], marker='o', color=paleblue, markersize=8, linewidth=2)
  line3 = Line2D([], [], marker='x', color='r', linewidth=0, markeredgewidth=2)
  prop = matplotlib.font_manager.FontProperties(size='small')
  figlegend((line1, line2, line3), ('Response Time', 'Throughput', 'Failures (50x)'),
    'lower center', prop=prop, ncol=3)
 
  # Write the PNG file
  savefig(label)


I have used gnuplot extensively in the past - but switched about three years ago when I discovered matplotlib. I found gnuplot's output very 1980s-ish by comparison; perhaps it's improved since then.

I personally find Python a joy to work with, so that's no obstacle. I also have some familiarity with matlab so that has helped with the learning curve.

There are some more details about the test plan here:
http://www.metaltoad.com/blog/jmeter-test-plan-drupal
or just the JMX file:
http://www.metaltoad.com/sites/default/files/DrupalStress.jmx_.gz

The test plan is parameterized, and so can be run in a loop via an external script.

Funny thing about the writing with matplotlib, though - the API contains both an object-oriented and procedural syntax. Things can get really confusing when you start mixing them. In general the OO interface seems to be preferred, but there are still a lot of examples using the matlib-style code.


Can you give me some advice how to make those graphs? Your drupal test plan gives a csv file like this:

1286967155126,13,Home page - anon,200,OK,Anonymous Browsing 1-1,text,true,4
1286967155140,9,Home page - anon,200,OK,Anonymous Browsing 1-1,text,true,2
1286967155150,11,Home page - anon,200,OK,Anonymous Browsing 1-1,text,true,3
...

then if I save this file to 1-overall-summary.csv and try to run it with your script like this:

python yourscript.py 1-overall-summary.csv

it gives a following error:

File "yourscript.py", line 18, in
if (not row['label'] in elapsed):
KeyError: 'label'


Your CSV file should start with a line that looks something like this:

timeStamp,elapsed,label,responseCode,responseMessage,
threadName,dataType,success,Latency

On the Summary Report listener, click the "Configure" button and make sure that "Save Field Names (CSV)" is checked.


Thanks for this, I was running a quick search before starting my own gnuplot script!

One thing, can your script be modified to use the 'allThreads' (Active thread count) instead of having multiple files? Or am I missing something?

Thanks again


I ended up using multiple individual test runs, because I didn't know how to determine the number of active threads.

If "allThreads" reports this, then yes I imagine you could use a ramp time in your test plan, and group the samples into bins for plotting.


Hi, I got the following error when I try to run the source code after installing Python 2.7 MSI and Matplotlib.
Module numpy is missing ?

Traceback (most recent call last):
File "C:/Python27/PlotRSRGraph.py", line 3, in
from pylab import *
File "C:\Python27\lib\site-packages\pylab.py", line 1, in
from matplotlib.pylab import *
File "C:\Python27\lib\site-packages\matplotlib\__init__.py", line 135, in
from matplotlib.rcsetup import (defaultParams,
File "C:\Python27\lib\site-packages\matplotlib\rcsetup.py", line 19, in
from matplotlib.colors import is_color_like
File "C:\Python27\lib\site-packages\matplotlib\colors.py", line 52, in
import numpy as np
ImportError: No module named numpy
>>>


Hi, Do you know how those green boxes are drawn? Are the most common response times inside the box and the rest above it? And what is that line above boxes? Is it indicating some percentile of all values?


The green boxes are a standard box plot: The box shows the 25th - 75th percentile. The "whiskers" are 1.5 times the inter-quartile range, and the hatches beyond are outliers. For a normal distribution, the 1.5*IQR rule for the whiskers will contain about 99.3% of the distribution.


Thanks Dylan. Yes it worked now after installing numpy module with Python26 :)


Hi Dylan,

Thanks for your help, I think I am getting somewhere although it seems like so near and yet so far :) I got the following error after installing Numpy module

C:\Python26>python jmetergraph.py 5-jmetergraph.csv
5-jmetergraph.csv
threads = 5
Traceback (most recent call last):
File "jmetergraph.py", line 20, in
if (not row['label'] in elapsed):
KeyError: 'label'

My CSV file looks like the following.

timeStamp|elapsed|label|responseCode|responseMessage|threadName|dataType|success|Latency
1294992313318|3001|/|200|OK|Thread Group 1-1|text|true||12922|1912
1294992313837|2914|/|200|OK|Thread Group 1-2|text|true||12922|1790
1294992316757|743|/styles/style_0.css|200|OK|Thread Group 1-2|text|true||1755|743
1294992314850|2984|/|200|OK|Thread Group 1-4|text|true||12922|1783
1294992316357|1484|/|200|OK|Thread Group 1-7|text|true||12922|792
1294992316367|1479|/styles/style_0.css|200|OK|Thread Group 1-1|text|true||1755|1479
1294992317503|628|/scripts/function.js|200|OK|Thread Group 1-2|text|true||1064|628
1294992315351|2917|/|200|OK|Thread Group 1-5|text|true||12922|1885
1294992317840|588|/styles/style_0.css|200|OK|Thread Group 1-4|text|true||1755|588

Do you know what could be the problem here ?


Hi Dylan,

I think I managed to fix the earlier error of "if (not row['label'] in elapsed):
KeyError: 'label'" by checking on
Save Field Names (CSV)" as you rightly pointed :)
However, I encountered the following problem then after.

threadName': 'OK', 'label': '/Logout.aspx', 'responseMessage': '200', 'elapsed': '468'}
row = {'': '185', 'Latency': 'TRUE', 'success': 'text', 'dataType': 'Thread Group 1-6', 'timeStamp': '1295590000000', '
threadName': 'OK', 'label': '/Login.aspx', 'responseMessage': '200', 'elapsed': '199'}
Traceback (most recent call last):
File "jmetergraph.py", line 133, in
savefig(label)
File "C:\Python26\Lib\site-packages\matplotlib\pyplot.py", line 363, in savefig
return fig.savefig(*args, **kwargs)
File "C:\Python26\Lib\site-packages\matplotlib\figure.py", line 1084, in savefig
self.canvas.print_figure(*args, **kwargs)
File "C:\Python26\Lib\site-packages\matplotlib\backend_bases.py", line 1923, in print_figure
**kwargs)
File "C:\Python26\Lib\site-packages\matplotlib\backends\backend_agg.py", line 443, in print_png
filename_or_obj = file(filename_or_obj, 'wb')
IOError: [Errno 2] No such file or directory: '/images/btn_submitrequest.png'

I need to create the above file/directory ?


Hi Dylan,

I think I am good now, I managed to find the problem and make some simple changes to the scripts.

# Write the PNG file
#print "label =", label

label = label.replace("/",".")
label = label + ".png"
print "label =", label

savefig(label)

It's working now and I have to really thank you for your contribution, it's a really nice graph :)

Cheers
Peter


Glad you got it working! I'm not sure why your output files are delimited by "|" - the default for CSV is of course a comma. From searching around it seems it can be controlled by the parameter jmeter.save.saveservice.default_delimiter in your jmeter.properties.


Yes, delimiter can be set through jmeter.properties.
Another thing that I found out is that I need to check Save As XML to save the data in CSV file using Simple Data Writer listener.

Else it will look like below in a single cell row.

timeStamp|elapsed|label|responseCode|responseMessage|threadName|dataType|success|Latency
1294992313318|3001|/|200|OK|Thread Group 1-1|text|true||12922|1912


Hi Dylan,

I think I mess up my configuration previously, so my previous post regarding check Save As XML when writing to a CSV file using Simple Data Write is not true.

My apology for the wrong info :)

Cheers
Peter


Could you explain how do you read the throughput from this chart? Which axis does it correspond to ... ? For e.g., in the first chart, at 16 concurrent requests you have a throughput close to 10 seconds or 150 requests/sec.

Great post, thanks!


hi, actually i am new to this and i need help, can you give me simple steps to start with it, starting from jmeter ?


Hi All, actually i need your help, it is my first time to use jmeter and i have been requested to get the output on plot box graph, can you guide me what i have to do exactly, i am windows user, and java developer i have no idea about python, thanks in advance.


hi, now i installed python,numpy and Matplotlib and when i tried to run the file i got the below error, please help it is urgent :(
C:\Python27>python.exe test.py jsf.csv
Traceback (most recent call last):
File "test.py", line 14, in
threads = int(file.split('10')[0])
ValueError: invalid literal for int() with base 10: 'jsf.csv'


hello, please ignore my previous comment, now it is working but i have one image for every http request in the test plan ? is that normal, i mean i have 4 http requests for 4 pager, and at the end i got 4 images !!??


Thanks Dylan,
so how i can make just one label in my test plan ?
so that i can get all the 4 http requests result in one image ?


you mean to rename the 4 samplers (http request) with the same name ?


Any tips on how to generate a (similar) plot (same axis & plot labels of response time, throughput, and # threads) from a summary CSV file? I'm talking about the file generated by doing a "Save Table Data" with "Save Table Header" option in Summary Report and Aggregate Graph.

It has CSV columns of

Label,# Samples,Average,Median,90% Line,Min,Max,Error %,Throughput,KB/sec

We can use either Average, Median, or 90% Line as response time and we already have the throughput value, don't need to calculate. And maybe can make use of "Error %" for errors.


hmm, im running into an issue with the script.. not sure what is going on, not a python person :/

steves-mac-mini:output user$ /opt/local/bin/python2.7 graph.py Drupal6/1-overall-summary.csv 
Traceback (most recent call last):
  File "graph.py", line 16, in <module>
    threads = int(file.split('-')[0])
ValueError: invalid literal for int() with base 10: 'Drupal6/1'
steves-mac-mini:output user$ ls
Drupal6		graph.py	jmetergraph.pl
steves-mac-mini:output user$ cd Drupal6/
steves-mac-mini:Drupal6 user$ ls
1-overall-summary.csv


I was able to output a graph, but I dont seem to see all the concurrent requests like 2, 4, 256, 512 in the same image, how do you get it to create one for all the tests.. see my screen grab.. way differnt than yours.. http://grab.by/bIBu

i ran the shell script and the command was..

#!/bin/bash
 
# The host under test.
#HOST=localhost
 
# Ramp up by factors of sqrt(2).
for thread_count in 2 3 4 6 8 11 16 23 32 45 64 91 128 181 256 362 512
do
  /Users/user/Desktop/Jmeter/bin/jmeter -n -t ph.jmx -Jthreads=$thread_count 
done

note i have the host, user,log hardcoded in my jmx..


Hi,am new to jmeter.can you tell where to run this script from either python or matplot lib?if in python where to input the csv file where the results is stored


when i run the script C:\Python27>python.exe script.py 10users.csv
Traceback (most recent call last):
File "test.py", line 16, in
threads = int(file.split('10')[0])
ValueError: invalid literal for int() with base 10: '10users.csv'..

how to get rid of this error and run the script pls help.Its urgent


thanks dylan.but now am getting error as

' D:\Python27>python.exe script.py 100-users.csv
File "script.py", line 134
savefig(label)
^
IndentationError: unexpected indent'

What does this mean


Hi,I have another issue.When am running the script i get graph for only one of the request.In my csv there are 8 request but graph is getting generated for only one request..why is it so and where can we set a default path to save the graph


It is a great way to visualize data. I use 'R'. Do you think I can somehow get your dataset ? Quite interested in coding 'R' which is probably more suitable for statistical analysis. I can give you the 'R' code as an incentive :-)

Thanks.


Hi Dylan,

I went through jmeter_results.zip data and realized that the value for row['success'] is TRUE.
However the code seems to be testing on the value 'true' instead of 'TRUE'.
Is this correct ?

        if (row['success'] != 'true'):
            errors[row['label']][threads].append(int(row['elapsed']))

errors list gets appended


you are right! The Excel is the culprit here capitalizing the true value in General format column.
Also it changes the timeStamp value to 1.347E+12 scientific format.
I use Notepad++ to edit the values below.

timeStamp,elapsed,label,responseCode,responseMessage,threadName,dataType,success,Latency
1347000000000,36,Login Form,200,OK,Authenticated Browsing 2-1,text,true,34

However I seem to be getting the following AssertionError now.
Do you happen to know what causes this error?

  File "C:\Python27\lib\site-packages\matplotlib\path.py", line 140, in __init__
    assert codes[0] == self.MOVETO
AssertionError


OK I extracted the files in a folder and got the following lines now using Notepad++

timeStamp,elapsed,label,responseCode,responseMessage,threadName,dataType,success,Latency
1346999371217,36,Login Form,200,OK,Authenticated Browsing 2-1,text,true,34
1346999371217,35,Login form,200,OK,Node save 3-1,text,true,34
1346999371217,35,Login Form,200,OK,Perform Login/View Account 5-1,text,true,35
1346999371227,51,Home page - anon,200,OK,Anonymous Browsing 1-2,text,true,51
1346999371229,23,Home page - anon,200,OK,Anonymous Browsing 1-4,text,true,22
1346999371226,28,Home page - anon,200,OK,Anonymous Browsing 1-1,text,true,27
1346999371226,30,Search,200,OK,Search 4-1,text,true,29

However when I run the above Python scripts with these data, I still get the following AssertionError.
  File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "C:\Python27\lib\site-packages\matplotlib\patches.py", line 421, in draw
    tpath = transform.transform_path_non_affine(path)
  File "C:\Python27\lib\site-packages\matplotlib\transforms.py", line 2227, in transform_path_non_affine
    return self._a.transform_path_non_affine(path)
  File "C:\Python27\lib\site-packages\matplotlib\transforms.py", line 1368, in transform_path_non_affine
    path._interpolation_steps)
  File "C:\Python27\lib\site-packages\matplotlib\path.py", line 140, in __init__
    assert codes[0] == self.MOVETO
AssertionError

Add new comment

Restricted HTML

  • Web page addresses and email addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd> <h4> <h5> <h6>
  • Lines and paragraphs break automatically.
  • You can enable syntax highlighting of source code with the following tags: <code>&lt;code&gt;</code>, <code>&lt;blockcode&gt;</code>, <code>&lt;apache&gt;</code>, <code>&lt;c&gt;</code>, <code>&lt;cpp&gt;</code>, <code>&lt;css&gt;</code>, <code>&lt;drupal5&gt;</code>, <code>&lt;drupal6&gt;</code>, <code>&lt;html&gt;</code>, <code>&lt;java&gt;</code>, <code>&lt;javascript&gt;</code>, <code>&lt;mysql&gt;</code>, <code>&lt;php&gt;</code>, <code>&lt;python&gt;</code>, <code>&lt;ruby&gt;</code>, <code>&lt;sql&gt;</code>, <code>&lt;xml&gt;</code>. The supported tag styles are: <code>&lt;foo&gt;</code>, <code>[foo]</code>.

About the Author

Dylan Tack, Director of Technology

Dylan is a software engineer with more than a decade of experience working with a wide variety of clients including the Linux Foundation, PBS, Habitat for Humanity, TV.com and the Emmys. His background includes training as an electrical engineer, but he became passionate about open source through his work with a university genetics lab.

Dylan is a proud member of the Drupal community, a member of the Drupal security team, and has extensive experience with Perl and Java. His other interests include computer security, embedded design, climbing, and brewing.

His latest talk at the Pacific Northwest Summit was titled: "Drupal Security for People Who Don't Care".