Using Amazon Cloudfront with Drupal

We like to use our own site to experiment with different technologies. CDN's are nothing new, and Metal Toad has projects running on competing systems including Akamai and Level 3. Still, I think Amazon Cloudfront is an interesting offering and I wanted to give it a spin. Here's my review of the service after setting it up with Drupal:

Pros:

  • Easy setup, low-cost, only pay for what you use
  • Support for CNAMES and domain sharding (e.g. static[1234].metaltoad.com)
  • Supports Accept:Byte-Range headers (important for media files)
  • Supports gzip compression (assuming your origin server does the compression)
  • Supports HTTPS
  • Honors cache-control headers
  • Custom HTTP headers are passed through to edge requests (including CORS)

Cons:

  • No global or directory "purge" command - each file must be invalidated individually
  • No custom SSL Certificates, so you can't use CNAMES with HTTPS
  • Fussy about SSL ciphers on the origin server (Cloudfront wasn't compatible with the ciphers configured on our load balancer, so I had to use "http-only" instead of "match-viewer")
  • Only minimal logging, no reports or graphs

Domain sharding

Multiple CNAMES add a little overhead with extra DNS lookups, but increase the number of parallel downloads (browsers impose a per-hostname limit). To minimize upstream bandwidth needed, these domains should be cookie-free. Since I chose subdomains of the site itself, I needed to adjust some cookie settings in Drupal:

  • In settings.php, set $cookie_domain = 'www.metaltoad.com';
  • In the Google Analytics module, set the tracking to "One domain with multiple subdomains"

Thanks to some new alter hooks in Drupal 7, all you need to implement a static file CDN is hook_file_url_alter(). There's an actively maintained CDN module, but in the spirit of inquiry I decided to implement the hook directly.

Module file

/**
 * Implements hook_file_url_alter().
 */
function mymodule_file_url_alter(&$uri) {
  // Route static files to Amazon CloudFront.
  if ($_SERVER['HTTP_HOST'] == 'www.metaltoad.com') {
    if ($GLOBALS['is_https']) {
      // Cloudfront doesn't support custom SSL certs, so we need to use Amazon's.
      $cdn = 'https://abcdef12345.cloudfront.net';
    }
    else {
      // Multiple hostnames to parallelize downloads.
      $shard = crc32($uri) % 4 + 1;
      $cdn = "http://static$shard.metaltoad.com";
    }
    $scheme = file_uri_scheme($uri);
    if ($scheme == 'public') {
      $wrapper = file_stream_wrapper_get_instance_by_scheme('public');
      $path = $wrapper->getDirectoryPath() . '/' . file_uri_target($uri);
      $uri = "$cdn/$path";
    }
    else if (!$scheme && strpos($uri, '//') !== 0) {
      $uri = "$cdn/$uri";
    }
  }
}
 
/**
 * Implements hook_boot().
 */
function mymodule_boot() {
  // Make sure Amazon CloudFront doesn't serve dynamic content.
  if (!empty($_SERVER['HTTP_X_AMZ_CF_ID']) && !strstr($_GET['q'], 'files/styles')) {
    header("HTTP/1.0 404 Not Found");
    print '404 Not Found';
    exit();
  }
}
 
/**
 * Implements hook_css_alter().
 */
function mymodule_css_alter(&$css) {
  // Mangle the paths slightly so that drupal_build_css_cache() will generate
  // different keys on HTTPS.  Necessary because CDN URL varies by protocol.
  if ($GLOBALS['is_https']) {
    foreach ($css as $key => $style) {
      if ($style['preprocess'] && $style['type'] == 'file') {
        $css[$key]['data'] = './' . $style['data'];
      }
    }
  }
}

.htaccess

# Set CORS header on static assets for CDN.
<FilesMatch "\.(ttf|otf|eot|woff|css|css\.gz|js|js\.gz)$">
  <IfModule mod_headers.c>
    Header set Access-Control-Allow-Origin "*"
  </IfModule>
</FilesMatch>

The HTTPS and sharding support adds a little complexity, but overall the integration is straightforward. I'd recommend Cloudfront to anyone who wants an easy and cost-effective scalability win.


Hi Wim, Thanks for the feedback! We've left Drupal's default 2 week expiration in place. What do you think of opening a core issue to raise this value? In terms of perceived freshness 2 weeks is essentially infinite (no visitor or site operator would wait this long for new content), yet this value is shorter than what's commonly recommended (I've seen everything from 30 days to 10 years).

Also, do you have any suggestions on handling updates to image style presets? From what I've seen, Drupal automatically purges the directory when the form at admin/config/media/image-styles/edit/%style is submitted, but the URLs aren't versioned so there's no way to update downstream caches.

I agree the global purge is mostly unnecessary, I mention it mostly as a difference from other services.


If core's going to use Far Future expiration, it should also ensure URLs change whenever files change. I don't believe it's core's responsibility to do this — yet. Especially because it can present a scalability issue for certain sites (Drupal may hit the FS for many files for each generated page, depending on how the unique file identifier generation is configured, the number of files and whether you have page caching enabled).

Your point about image styles is a great one. It's one that's actually addressed through the CDN module's Far Future expiration functionality as well. A simple solution could be to include a hash of the image style's configuration in the URL — whenever the image style configuration changes, the URLs would also change.

In general: when you have any feedback on the CDN module, let me know in the issue queue — I'd be more than happy to work with you! :) I'm always trying to make it better.
Likewise, if you're working on WPO issues in Drupal core — let me know :)


Interesting, in practice I've observed Drupal often does take responsibility for changing the filename when new versions are uploaded (FILE_EXISTS_RENAME is the default for file_copy() and related functions, and it's effects are seen when e.g. uploading a new theme logo or filefield). I would agree we're not very intentional about following this practice in core.

Anyway, I'm aware of a few WPO issues that might interest you:

BTW Metal Toad is using the CDN module on fearnet.com; many thanks for your contributions!

Add new comment

Restricted HTML

  • Web page addresses and email addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd> <h4> <h5> <h6>
  • Lines and paragraphs break automatically.
  • You can enable syntax highlighting of source code with the following tags: <code>&lt;code&gt;</code>, <code>&lt;blockcode&gt;</code>, <code>&lt;apache&gt;</code>, <code>&lt;c&gt;</code>, <code>&lt;cpp&gt;</code>, <code>&lt;css&gt;</code>, <code>&lt;drupal5&gt;</code>, <code>&lt;drupal6&gt;</code>, <code>&lt;html&gt;</code>, <code>&lt;java&gt;</code>, <code>&lt;javascript&gt;</code>, <code>&lt;mysql&gt;</code>, <code>&lt;php&gt;</code>, <code>&lt;python&gt;</code>, <code>&lt;ruby&gt;</code>, <code>&lt;sql&gt;</code>, <code>&lt;xml&gt;</code>. The supported tag styles are: <code>&lt;foo&gt;</code>, <code>[foo]</code>.

About the Author

Dylan Tack, Director of Technology

Dylan is a software engineer with more than a decade of experience working with a wide variety of clients including the Linux Foundation, PBS, Habitat for Humanity, TV.com and the Emmys. His background includes training as an electrical engineer, but he became passionate about open source through his work with a university genetics lab.

Dylan is a proud member of the Drupal community, a member of the Drupal security team, and has extensive experience with Perl and Java. His other interests include computer security, embedded design, climbing, and brewing.

His latest talk at the Pacific Northwest Summit was titled: "Drupal Security for People Who Don't Care".