
Faster Database Backups via Drush! Plus Capistrano Integration
When working with Drupal sites, Drush is your go-to tool. This post is going to focus on the drush sql-dump command. This allows you to export your database to a sql file, so you can restore it later. This can be particularly useful when you are working in a development environment and need to deploy a site to production for the first time. Or when you start work on a new clients existing site, you need to export their live database and download it to your local environment.
For the most part this is usually a fairly quick task (10-15min). However, for some rather large sites the databases can fairly large and slow this down. One thing drush does to help limit the amount of unnecessary data being downloaded and export, is it ignores certain tables and only exports their structure. For example the cache, sessions and watchdog tables are excluded by default. The problem is that many modules add their own cache tables and that can add up to some huge differences in the sizes of your backups.
drushrc.php to the Rescue
Thankfully drush provides a way for us to extend this. If you look at the example.drushrc.php provided by drush explains the different places you can add a drushrc.php file to add to the list of tables to be ignored; such as in your users ~/.drush/ folder or in each project sites/all/drush folder. We decided to make a drushrc file for each project because then it can be used on every environment you have setup with the project, and there is currently an issue when you include tables in your drushrc file that do not exist on one project or another, so adding it to your .drush folder may cause problems on some of your projects.
Here is an example file that we use as a starting point.
// Add tables to structure-only list // Set default if it doesn't exist. Copied from example.drushrc.php if (!isset($options['structure-tables']['common'])) { $options['structure-tables']['common'] = array('cache', 'cache_filter', 'cache_menu', 'cache_page', 'history', 'sessions', 'watchdog'); } /** * These are the defaults for the majority of projects at Metal Toad. Add or * remove tables as needed. */ $options['structure-tables']['common'] = array_merge($options['structure-tables']['common'], array( 'cache_admin_menu', 'cache_block', 'cache_field', 'cache_form', 'cache_path', 'cache_token', 'cache_update', 'cache_views', 'cache_views_data', 'ctools_css_cache', 'ctools_object_cache', )); /** * If you use Drupal core search, you can uncomment this section to exclude * the search index tables. */ /* // Add option to download search tables if you really want $options['structure-tables']['search'] = $options['structure-tables']['common']; // By default don't download the search index tables though $options['structure-tables']['common'] = array_merge($options['structure-tables']['common'], array( 'search_dataset', 'search_index', 'search_node_links', 'search_total', )); */
There are some cache tables that are for very specific modules, that you may or may not use, but we use on at least 90% of our Drupal projects. Admin Menu, Pathauto Token, Views and Ctools. Also if you can see that the core search module, you can save time by excluding the search index tables, by uncommenting the bottom section. I also added an option to include them, just in case you need them to do some search debugging instead of running cron locally a bunch of times to re-index the site. To download the search tables as well, just run: drush sql-dump --structure-tables-key=search > db_backup.sql
Integrating the Different Export Options with Capistrano
Now that you have sql-dumps configured, if you followed our Capistrano + Drupal posts, you'll want to add support for these new options in your db:down/db:pull tasks. First I added set :tables, "common" to config/deploy.rb and in the Capfile I updated the db:down task to look like this:
task :down, :roles => :db, :only => { :primary => true } do run_locally "mkdir -p db" domains.each do |domain| if (tables == "common") filename = "#{domain}_#{stage}.sql" else filename = "#{domain}_#{stage}_#{tables}.sql" end temp = "/tmp/#{release_name}_#{application}_#{filename}" run "touch #{temp} && chmod 600 #{temp}" run "#{drush} --uri=#{domain} sql-dump --structure-tables-key=#{tables} > #{temp}" download("#{temp}", "db/#{filename}", :via=> :scp) run "rm #{temp}" end end
Now when you want to download the production database with the search index data as well, you just run cap prod db:down -s tables=search
What's the Difference?
So what did we gain by doing this? Here are some rough stats.
Download Times
$ time cap prod db:down
Before
real 23m13.736s
user 7m17.669s
sys 1m26.844s
After
real 7m39.958s
user 2m31.454s
sys 0m30.187s
Filesize
Before
3.3G
After
1.3G
Import Time
$ time drush sql-cli < ../db/default_prod.sql
Before
real 334m15.027s
user 1m13.609s
sys 0m5.235s
After
real 16m20.778s
user 0m29.509s
sys 0m2.176s
Comments
There's very little reason not to use memcache these days, which would mean that all the cache tables (apart from the form cache, which needs persistent storage) would be empty.
For large/high-performance sites, it's worth using the syslog module instead of dblog (or the contrib GELF module, to log directly to Graylog), so the watchdog table would also be empty.
Two quick and easy ways to speed up all db backups :-)
Fri, 10/12/2012 - 17:19
If you want to be able to use wildcards (to be able to e.g. simply say cache_*
instead of listing every single cache table, which may also be different for every single site), please help drive this Drush patch home: http://drupal.org/node/698264.
Fri, 10/12/2012 - 17:24
Wow good idea, wildcards would be great! Another thing Jonathan and I were talking about the other day, it would be really useful if non-existing tables in the structure-tables
setting didn't cause sql-dump
to fail.
Fri, 10/12/2012 - 18:21
Exactly! sql-dump
fails for silly reasons. It shouldn't :) Let's fix that? I pushed it to the 85%, could you maybe push that patch over the finish line? :)
Tue, 10/16/2012 - 04:50
Do I still need to add --structure-tables-key
param if i used drush cca
before using drush sql-dump
Tue, 10/16/2012 - 22:12
That's a good point, as far as I can tell from the drush source the --structure-tables-key param does not have a default value, so you will still need to pass that. In my examples the capistrano task always passes it.
Fri, 05/31/2013 - 14:24
Hi,
I would like to know if there is a way to limit the rows using sql-dump like in mysqldump
mysqldump --where="1 LIMIT 50000" would limit the rows to 50000.
We have few very large tables like node, taxonomy which will not fit in dev server and it is really old data we like to avoid.
Thank you.
Fri, 05/31/2013 - 17:10
We have the same issue on a few projects, and have not yet hit on a good solution. It's especially complicated since often the entities we need to prune are spread across many database tables, so even a LIMIT statement wouldn't produce a consistent data set.
Related, I use this query to identify the largest tables, sometimes the result is surprising (such as finding 200MB in userpoints_txn
)!
Wed, 10/10/2012 - 18:05