Handling long-running background tasks in Drupal 7

In my previous post, I discussed how to import a large dataset into Drupal via Drush's batch API. In this blog post, I'll cover how to create background tasks in Drupal 7 that will take long amounts of time to finish.

Why would you ever want that?

If you have a task that must occur regularly, but will take a long amount of time to complete, the cron queue might be a good solution. For instance, if you have a lot of nodes that need to stay synchronized with a remote dataset, you might want to synchronize a large portion of them during a cron run, but would like your other cron tasks to complete in a timely manner.

Drupal provides an easy interface to adding such long-running background tasks via hook_cron_queue_info() and hook_cron().

Implement hook_cron()

hook_cron() is run everytime the drupal cron job is run. However, it is not very well suited to longer running tasks since it runs them sequentially. To avoid holding up the other cron job tasks, we'll need to create an item in the DrupalQueue.

  function mymodule_cron() {
    // ...
    // get dataset to work on
    // ...
 
    $queue = DrupalQueue::get("resync");
    foreach ($dataset as $data) {
      $queue->createItem($data);
    }
  }

Now on every cron run, we'll insert a bunch of items in our queue to process. Let's go ahead and let Drupal know what to do for each queue's item.

  function mymodule_cron_queue_info() {
    $queues['resync'] = array (
      'worker callback' => 'mymodule_resync_item',
      'time' => 180,    // Time, in seconds, to let this process run [Optional]
    );
  }
 
  function mymodule_resync_item($data) {
    // ...
    // Code to resync data here
    // ...
  }

With just those three methods you have created background tasks that will not hold up the normal cron job tasks! It is worth noting that you don't have to enter these queue items via hook_cron(), but could add them during some other time. You might create the queue items on node creation or deletion, for instance.

Filed under: 


Have a look at the Ultimate Cron module (which uses Progress and Background Process). It allows quite fine-grained control of cron, which helps with site performance in general.


The real fun starts when you want those same processes to be run manually (so through the Batch API) and automatically (so through cron). This forum post explains how to do this:

http://drupal.org/node/988192#comment-3847954

Although it's heavily D6-oriented, it should be possible to immediately take this to D7 as well. The result is a technique with which you can make some pretty awesome stuff.


Even when using the queue, the execution time of cron.php is unchanged. Seems like the queue is executed during the request to cron.php... Which defeats the whole purpose.


hello i use the same structure as you use at this sample but its not working and i dont get it why. here is the code :

function MODULENAME_cron() {
$nodes=expired_nodes('type'); //a function that fetch the nodes id array i want
  $queue = DrupalQueue::get('update_node');
  foreach ($nodes as $row) {
    $queue->createItem($row);
  }
  drupal_flush_all_caches();
}
 
function MODULENAME_cron_queue_info() {
  $queues['update_node'] = array(
    'worker callback' => 'MODULENAME_callback',
    'time' => 30, // time in second for each worker
  );
  return $queues;
}
 
function MODULENAME_callback($data){
      foreach ($data as $row) {
db_insert('field_data_field_SOMENAME')
  ->fields(array(
    'entity_type' => 'node',
    'bundle' => 'event',
    'entity_id'=>$row,
    'revision_id'=>$row,
    'language'=>'und',
    'delta'=>0,
    'field_other_tid'=>196,
  ))
  ->execute();
db_insert('field_revision_field_SOMENAME')
  ->fields(array(
    'entity_type' => 'node',
    'bundle' => 'event',
    'entity_id'=>$row,
    'revision_id'=>$row,
    'language'=>'und',
    'delta'=>0,
    'field_other_tid'=>196,
  ))
    ->execute();
  }
}


I used the code and I believe that flushing the caches also clears the queue. Perhaps flush the caches before adding to the queue? I was running into the same problem but it went away after I removed that particular line.


Thanks for the great blog. I did find one small error in your code. mymodule_cron_queue_info must return $queues.

  function mymodule_cron_queue_info() {
    $queues['resync'] = array (
      'worker callback' => 'mymodule_resync_item',
      'time' => 180,    // Time, in seconds, to let this process run [Optional]
    );
    return $queues;
  }

About the Author

Chris Svajlenka, Senior Developer

Chris is a web developer with many years of experience developing web applications. PHP is his go-to language, with experience with others such as Ruby and Python. He has provided systems administration to large organizations and enjoys hunting down bugs in code.

Chris enjoys a love of documentaries with an emphasis on nature, the cosmos and sustainable business practices. He also has a love for plants, gardening and a fine IPA.

Interested? Let's talk.