Using the Drupal Batch API

Recently I was working on a site for a library that had a lot of data that needed to be imported into Drupal as nodes. Each book title, e-book, DVD, etc. needed to be a node inside of their Drupal 7 website. Not only that, but the database that held this data would add new records and occasionally update and remove existing ones. This meant about 300,000 - 400,000 nodes that had to be created and kept synchronized with their internal database. In this post I'll outline how I made use of Drush and the batch API to import the dataset into Drupal from a terminal.

Bringing the data in

My first challenge was to import the dataset into Drupal. I had quite a bit of data to work with so I had to utilize the batch API. The Batch API allows you to run one or more method on to a large set of data without worrying about PHP timeouts and can provide feedback on the progress of the operation. I had created a module to handle the updating and importing of the library data. To create the batch queue you must build an array for batch_set();

function mymodule_setup_batch($start=1, $stop=100000) {
  //  ...
  //  Populate $lots_of_data from record $start to record $stop.
  //  ...
 
  //Break up all of our data so each process does not time out.
  $chunks = array_chunk($lots_of_data, 20);
  $operations = array();
  $count_chunks = count($chunks);
 
  //for every chunk, assign some method to run on that chunk of data
  foreach ($chunks as $chunk) {
    $i++;
    $operations[] = array("mymodule_method_to_work_on_a_small_part", array( $chunk ,'details'=> t('(Importing chunk @chunk  of  @count)', array('@chunk '=>$i, '@count'=>$count_chunks))));
    $operations[] = array("mymodule_another_method",array($chunk));
  }
 
  //put all that information into our batch array
  $batch = array(
    'operations' => $operations,
    'title' => t('Import batch'),
    'init_message' => t('Initializing'),
    'error_message' => t('An error occurred'),
    'finished' => 'mymodule_finished_method'
  );
 
  //Get the batch process all ready!
  batch_set($batch);
  $batch =& batch_get();
 
  //Because we are doing this on the back-end, we set progressive to false.
  $batch['progressive'] = FALSE;
 
  //Start processing the batch operations.
  drush_backend_batch_process();
}

You'll also have to write what our operation methods will do. Each of these will be called with the parameters we set up earlier. In this case both methods will work on the same data, one right after the other.

function mymodule_method_to_work_on_a_small_part ($chunk, $operation_details, &$context) {
  //Do something to $chunk, maybe create a node?
  $context['message'] = $operation_details; //Will show what chunk we're on.
}
function mymodule_another_method($chunk, &$context) {
  //Do some more work.
  $context['message'] = t('We have done a second thing to a chunk of data');
}

We also need to code the method that is called when it is finished:

function mymodule_finished_method($success, $results, $operations) {
  //Let the user know we have finished!
  print t('Finished importing!');
}

Drushing data

I have always enjoyed the use of Drush, but I have never created my own Drush commands. It turns out, it is a very easy process. I decided to go ahead and make an import command, so I could start the batch process off and import a section of the entire dataset from the terminal. I placed the above code into a file mymodule.drush.inc and created the following methods:

function mymodule_drush_command() {
  $items  = array();
  $items['myimport'] = array(
    'callback'    => 'mymodule_setup_batch',
    'description' => dt('Import'),
    'arguments'   => array(
      'start'     => "start",
      'stop'      => "stop",
    ),
  );
  return $items;
}
 
function mymodule_drush_help($section) {
  switch ($section) {
    case 'drush:myimport':
      return dt("import items from the Internal Database [start record] [end record].");
  }
}

It was that simple to create a new Drush command! Now I could open up a terminal and type in drush myimport 100 2000 and watch Drupal import a bunch of records. The batch API can come in very handy when dealing with large amounts of data that may take undetermined lengths of time, such as a massive import or upgrade.

Good luck with your batch processes and happy coding!

Filed under:

Ready to get started?