A Drupal version of chicken vs. egg: external API content fetching and cron failures
Recently we had an interesting issue with one of our multi-site Drupal hosting deployments; the cron was not running regularly. Sifting through the logs showed that the cron seemed to be running extraordinarily long. After cron has been running for over an hour, Drupal would kill the 'cron_semaphore' variable, clear the cache and attempt to re-run cron. Cron would hang-up again, and the process would repeat itself into futile perpetuity.
What was really occurring, was that the cron job had been failing due to an excessive response time from one of the external fetches the site was executing (e.g. feed aggregator, Brightcove hosted content, etc.).
When a request to an external API takes too long and goes over the connection values set in your php.ini file, i.e.
max_execution_time, max_input_time, memory_limit, etc. Drupal loses its connection to the database.
At this point, Drupal begins to try to write error messages using watchdog(). However, in one of Drupal's more quirky side effects, it can't write the watchdog entries to the database about not being able to write to the database, because, well, it's not connected to the database. Useful!
This causes the cron run to fail out midstream and Drupal doesn't release the cron_semaphore variable because the process didn't complete correctly. This makes Drupal think that the cron is still running and the attempts at running any cron tasks for the next hour are ignored. After an hour, the semaphore is killed, usually getting orphaned again at the next instance of cron attempting to run and failing midstream.
We tried using ini_set() to set
mysqli_reconnect = 'On' in each site's settings.php file. That didn't work.
In our working solution, in order to allow Drupal to reconnect to it's database, we turned on the MySQLi reconnect variable in our php.ini file:
mysqli.reconnect = on
This configuration may not be ideal or work for all setups, and there is further discussion over at Drupal, but this was the most effective fix for us.
Now cron runs smoothly, even if an external API call hangs up for too long. Also, if making use of a cURL connection, using curl_setopt_array() to set a timeout ceiling, among other values, can be quite helpful.