Faster Database Backups via Drush! Plus Capistrano Integration

When working with Drupal sites, Drush is your go-to tool. This post is going to focus on the drush sql-dump command. This allows you to export your database to a sql file, so you can restore it later. This can be particularly useful when you are working in a development environment and need to deploy a site to production for the first time. Or when you start work on a new clients existing site, you need to export their live database and download it to your local environment.

For the most part this is usually a fairly quick task (10-15min). However, for some rather large sites the databases can fairly large and slow this down. One thing drush does to help limit the amount of unnecessary data being downloaded and export, is it ignores certain tables and only exports their structure. For example the cache, sessions and watchdog tables are excluded by default. The problem is that many modules add their own cache tables and that can add up to some huge differences in the sizes of your backups.

drushrc.php to the Rescue

Thankfully drush provides a way for us to extend this. If you look at the example.drushrc.php provided by drush explains the different places you can add a drushrc.php file to add to the list of tables to be ignored; such as in your users ~/.drush/ folder or in each project sites/all/drush folder. We decided to make a drushrc file for each project because then it can be used on every environment you have setup with the project, and there is currently an issue when you include tables in your drushrc file that do not exist on one project or another, so adding it to your .drush folder may cause problems on some of your projects.

Here is an example file that we use as a starting point.

// Add tables to structure-only list
// Set default if it doesn't exist. Copied from example.drushrc.php
if (!isset($options['structure-tables']['common'])) {
  $options['structure-tables']['common'] = array('cache', 'cache_filter', 'cache_menu', 'cache_page', 'history', 'sessions', 'watchdog');
}
 
/**
 * These are the defaults for the majority of projects at Metal Toad. Add or
 * remove tables as needed.
 */
$options['structure-tables']['common'] = array_merge($options['structure-tables']['common'], array(
  'cache_admin_menu',
  'cache_block',
  'cache_field',
  'cache_form',
  'cache_path',
  'cache_token',
  'cache_update',
  'cache_views',
  'cache_views_data',
  'ctools_css_cache',
  'ctools_object_cache',
));
 
/**
 * If you use Drupal core search, you can uncomment this section to exclude
 * the search index tables.
 */
/*
// Add option to download search tables if you really want
$options['structure-tables']['search'] = $options['structure-tables']['common'];
 
// By default don't download the search index tables though
$options['structure-tables']['common'] = array_merge($options['structure-tables']['common'], array(
  'search_dataset',
  'search_index',
  'search_node_links',
  'search_total',
));
*/

There are some cache tables that are for very specific modules, that you may or may not use, but we use on at least 90% of our Drupal projects. Admin Menu, Pathauto Token, Views and Ctools. Also if you can see that the core search module, you can save time by excluding the search index tables, by uncommenting the bottom section. I also added an option to include them, just in case you need them to do some search debugging instead of running cron locally a bunch of times to re-index the site. To download the search tables as well, just run: drush sql-dump --structure-tables-key=search > db_backup.sql

Integrating the Different Export Options with Capistrano

Now that you have sql-dumps configured, if you followed our Capistrano + Drupal posts, you'll want to add support for these new options in your db:down/db:pull tasks. First I added set :tables, "common" to config/deploy.rb and in the Capfile I updated the db:down task to look like this:

  task :down, :roles => :db, :only => { :primary => true } do
    run_locally "mkdir -p db"
    domains.each do |domain|
      if (tables == "common")
        filename = "#{domain}_#{stage}.sql"
      else
        filename = "#{domain}_#{stage}_#{tables}.sql"
      end
 
      temp = "/tmp/#{release_name}_#{application}_#{filename}"
      run "touch #{temp} && chmod 600 #{temp}"
      run "#{drush} --uri=#{domain} sql-dump --structure-tables-key=#{tables} > #{temp}"
      download("#{temp}", "db/#{filename}", :via=> :scp)
      run "rm #{temp}"
    end
  end

Now when you want to download the production database with the search index data as well, you just run cap prod db:down -s tables=search

What's the Difference?

So what did we gain by doing this? Here are some rough stats.

Download Times

$ time cap prod db:down

Before

real 23m13.736s
user 7m17.669s
sys 1m26.844s

After

real 7m39.958s
user 2m31.454s
sys 0m30.187s

Filesize

Before

3.3G

After

1.3G

Import Time

$ time drush sql-cli < ../db/default_prod.sql

Before

real 334m15.027s
user 1m13.609s
sys 0m5.235s

After

real 16m20.778s
user 0m29.509s
sys 0m2.176s

Comments

There's very little reason not to use memcache these days, which would mean that all the cache tables (apart from the form cache, which needs persistent storage) would be empty.

For large/high-performance sites, it's worth using the syslog module instead of dblog (or the contrib GELF module, to log directly to Graylog), so the watchdog table would also be empty.

Two quick and easy ways to speed up all db backups :-)

Interesting post!! We recently experienced a similar issue with one of our sites using Drupal 7. Our search for quick database backup and restore ended when we found the tool mydumper and myloader http://www.mydumper.org/ which is relatively faster than traditional mysqldump and it's using SQL format together with the advantage of threads.

Another tool worth considering is percona http://www.percona.com/

Do I still need to add --structure-tables-key param if i used drush cca before using drush sql-dump

That's a good point, as far as I can tell from the drush source the --structure-tables-key param does not have a default value, so you will still need to pass that. In my examples the capistrano task always passes it.

Hi,

I would like to know if there is a way to limit the rows using sql-dump like in mysqldump

mysqldump --where="1 LIMIT 50000" would limit the rows to 50000.

We have few very large tables like node, taxonomy which will not fit in dev server and it is really old data we like to avoid.

Thank you.

We have the same issue on a few projects, and have not yet hit on a good solution. It's especially complicated since often the entities we need to prune are spread across many database tables, so even a LIMIT statement wouldn't produce a consistent data set.

Related, I use this query to identify the largest tables, sometimes the result is surprising (such as finding 200MB in userpoints_txn)!

select table_name, format(data_length / 1E6, 2),
  format(index_length / 1E6, 2), table_rows
  from information_schema.TABLES
  where table_schema = 'my_database'
  order by data_length + index_length;

Add new comment

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>, <cpp>, <java>, <php>. The supported tag styles are: <foo>, [foo].
  • Web page addresses and email addresses turn into links automatically.
  • Lines and paragraphs break automatically.

Ready to get started?