URL Shorteners Must Die

URL shorteners (such as bit.ly and tinyurl) have been called the "herpes of the web". Beyond just link-rot, a public shortening service is per se an open redirect vulnerability. Their ubiquity makes them an easy vector for spammers, phishers, and cross-site forgery attacks.

Joshua Schachter writes:

With a shortening service, you're adding something that acts like a third DNS resolver, except one that is assembled out of unvetted PHP and MySQL, without the benevolent oversight of luminaries like Dan Kaminsky and St. Postel.

Luckily, you don't have to contribute to this scourge.

Drupal 7 has adopted the shortlink microformat, which adds a <head> element like so:

<link rel="shortlink" href="http://www.example.com/node/1" />

When we rebuilt our site in D7, we decided to ditch bit.ly in favor of these built-in shortlinks. However, I also felt the /node/ piece of the path was superfluous, and even strange-looking to visitors outside the Drupal community.

So, we decided to shorten them even further, removing both the "www" and /node/ from the URL. This required only a few minor changes:

.htaccess

Care was taken not to add the "www" prefix for these shortlinks, because doing so would result in multiple redirects (which still works, but is inefficient).

# Redirect all paths to the "www" prefix, except for /NNNN
RewriteCond %{HTTP_HOST} ^metaltoad\.com$ [NC]
RewriteCond %{REQUEST_URI} !^/\d+$
RewriteCond %{REQUEST_URI} !^/index.php
RewriteCond %{REQUEST_URI} !^/node/\d+
RewriteRule ^(.*)$ http://www.metaltoad.com/$1 [L,R=301]
 
# Allow short URLs of the form metaltoad.com/123
RewriteCond %{REQUEST_URI} ^/\d+$
RewriteRule ^(.*)$ index.php?q=node/$1 [L,QSA]

Redirect node/NNN to the alias

The next piece was to redirect to the actual path alias. At the time this site was built, neither the Global Redirect module nor it's successor Redirect, were deemed production-ready, so we rolled our own interim solution:

/**
 * Implements hook_init().
 */
function mymodule_init() {
  // Redirect node/NNN to the path alias, if available.
  // The globalredirect module isn't currently available for D7.
  if (preg_match('/^node\/\d+$/', request_path())) {
    $alias = drupal_get_path_alias();
    if ($alias != request_path()) {
      // Setting the redirect headers manually allows them to be
      // cached, which drupal_goto does not.
      drupal_add_http_header('Location', url($alias,
        array('absolute' => TRUE)));
      drupal_add_http_header('Status', '301 Moved Permanently');
      print '301 Moved Permanently';
      drupal_page_footer();
      exit();
    }
  }
}

Altering the <head> in template.php

This code was added to template.php to alter the <head> element. Since this is a renderable array in D7 it's easy!

/**
 * Implements hook_html_head_alter().
 * Generates shorter shortlinks of the form metaltoad.com/NNN.
 */
function metaltoad_html_head_alter(&$head_elements) {
  foreach ($head_elements as $key => $element) {
    if (isset($element['#attributes']['rel']) &&
      $element['#attributes']['rel'] == 'shortlink') {
      $href =& $head_elements[$key]['#attributes']['href'];
      if (preg_match('/^\/node\/\d+$/', $href)) {
        $href = str_replace('/node/', '', $href);
        $href = "http://metaltoad.com/$href";
      }
    }
  }
}

$base_url

Lastly, we made sure to explicitly set $base_url in settings.php. This ensures that when the Location header set, it uses the correct domain including the "www", again avoiding inefficient multiple redirects.

$base_url = 'http://www.metaltoad.com';

Now, we have our own share-friendly links that are only a few characters longer than bit.ly!
Before: http://bit.ly/c9Vk1R
After: http://metaltoad.com/318

Filed under 


Certainly, google has a more sustainable business model than other failed services such as http://tr.im/ , so it's likely goo.gl links will last longer.

I'm unconvinced they can solve the security risks however. Certainly they will filter some obvious phishing and malware sites, but there's really no way to detect all possible attacks.

For example, goo.gl created redirects for the following XSRF attack links without complaint:

But the XSRF issues that you describe have little to do with shorteners. How often do you check the destination of a link by looking down at the status bar before you click it? A shortener does little to improve the success rate of an attack.

Easier still, an attacker can just get you to go to a page that has an img tag where the src is the XSRF URL.


shorturl looks excellent as well. As a matter of taste, I thought the /NNN numeric paths looked less strange than the encoded output of shorturl (which uses letters and digits). But shorturl definitely has some advantages, especially if you have a larger number of nodes.

shorturl also allows you to create redirects for any URL, even external sites - not just nodes. This isn't a feature we really needed but I can see the utility of it.

With this method your pages have 2 URLs:
A "canonical" URL, which is the big SEO friendly pathauto version
A "shortlink", which redirects to the longer canonical URLs.

Also last I checked, the [nid] token didn't really work in pathauto because the alias is generated prior to saving the node, so the nid doesn't exist yet.


Yes, this is really just a special case of what we have done with the "www" prefix. Just make sure the short domain redirects to your main site and everything should work fine.


<sigh>

I guess I need to walk the walk. i just like the immediate gratification and easy view of how many clicks came through bit.ly...


I had coded the 'redirect to canonical URL feature' in redirect.module but at one point it didn't work, so I had commented it out. I went back today after reading this, tested it, and confirmed it's working again. I've also filed a patch for redirect.module to support the nid short-link redirect. http://drupal.org/node/933888


Is there any way to have input format filter that automatically shortens drupal url's? like the link filter that automatically detects links.

When using notifications and messaging with drupal, most often a long url link like http://mysite.com/drupal/sites/default/files gets broken when it arrives as an email message.

About the Author

Dylan Tack, Director of Technology

Dylan is a software engineer with more than a decade of experience working with a wide variety of clients including the Linux Foundation, PBS, Habitat for Humanity, TV.com and the Emmys. His background includes training as an electrical engineer, but he became passionate about open source through his work with a university genetics lab.

Dylan is a proud member of the Drupal community, a member of the Drupal security team, and has extensive experience with Perl and Java. His other interests include computer security, embedded design, climbing, and brewing.

His latest talk at the Pacific Northwest Summit was titled: "Drupal Security for People Who Don't Care".

Interested? Let's talk.