URL Shorteners Must Die

URL shorteners (such as bit.ly and tinyurl) have been called the "herpes of the web". Beyond just link-rot, a public shortening service is per se an open redirect vulnerability. Their ubiquity makes them an easy vector for spammers, phishers, and cross-site forgery attacks.

Joshua Schachter writes:

With a shortening service, you're adding something that acts like a third DNS resolver, except one that is assembled out of unvetted PHP and MySQL, without the benevolent oversight of luminaries like Dan Kaminsky and St. Postel.

Luckily, you don't have to contribute to this scourge.

Drupal 7 has adopted the shortlink microformat, which adds a <head> element like so:

<link rel="shortlink" href="http://www.example.com/node/1" />

When we rebuilt our site in D7, we decided to ditch bit.ly in favor of these built-in shortlinks. However, I also felt the /node/ piece of the path was superfluous, and even strange-looking to visitors outside the Drupal community.

So, we decided to shorten them even further, removing both the "www" and /node/ from the URL. This required only a few minor changes:

.htaccess

Care was taken not to add the "www" prefix for these shortlinks, because doing so would result in multiple redirects (which still works, but is inefficient).

# Redirect all paths to the "www" prefix, except for /NNNN
RewriteCond %{HTTP_HOST} ^metaltoad\.com$ [NC]
RewriteCond %{REQUEST_URI} !^/\d+$
RewriteCond %{REQUEST_URI} !^/index.php
RewriteCond %{REQUEST_URI} !^/node/\d+
RewriteRule ^(.*)$ http://www.metaltoad.com/$1 [L,R=301]
 
# Allow short URLs of the form metaltoad.com/123
RewriteCond %{REQUEST_URI} ^/\d+$
RewriteRule ^(.*)$ index.php?q=node/$1 [L,QSA]

Redirect node/NNN to the alias

The next piece was to redirect to the actual path alias. At the time this site was built, neither the Global Redirect module nor it's successor Redirect, were deemed production-ready, so we rolled our own interim solution:

/**
 * Implements hook_init().
 */
function mymodule_init() {
  // Redirect node/NNN to the path alias, if available.
  // The globalredirect module isn't currently available for D7.
  if (preg_match('/^node\/\d+$/', request_path())) {
    $alias = drupal_get_path_alias();
    if ($alias != request_path()) {
      // Setting the redirect headers manually allows them to be
      // cached, which drupal_goto does not.
      drupal_add_http_header('Location', url($alias,
        array('absolute' => TRUE)));
      drupal_add_http_header('Status', '301 Moved Permanently');
      print '301 Moved Permanently';
      drupal_page_footer();
      exit();
    }
  }
}

Altering the <head> in template.php

This code was added to template.php to alter the <head> element. Since this is a renderable array in D7 it's easy!

/**
 * Implements hook_html_head_alter().
 * Generates shorter shortlinks of the form metaltoad.com/NNN.
 */
function metaltoad_html_head_alter(&$head_elements) {
  foreach ($head_elements as $key => $element) {
    if (isset($element['#attributes']['rel']) &&
      $element['#attributes']['rel'] == 'shortlink') {
      $href =& $head_elements[$key]['#attributes']['href'];
      if (preg_match('/^\/node\/\d+$/', $href)) {
        $href = str_replace('/node/', '', $href);
        $href = "http://metaltoad.com/$href";
      }
    }
  }
}

$base_url

Lastly, we made sure to explicitly set $base_url in settings.php. This ensures that when the Location header set, it uses the correct domain including the "www", again avoiding inefficient multiple redirects.

$base_url = 'http://www.metaltoad.com';

Now, we have our own share-friendly links that are only a few characters longer than bit.ly!
Before: http://bit.ly/c9Vk1R
After: http://metaltoad.com/318

Comments

dylan's picture

Certainly, google has a more

Certainly, google has a more sustainable business model than other failed services such as http://tr.im/ , so it's likely goo.gl links will last longer.

I'm unconvinced they can solve the security risks however. Certainly they will filter some obvious phishing and malware sites, but there's really no way to detect all possible attacks.

For example, goo.gl created redirects for the following XSRF attack links without complaint:

But the XSRF issues that you

But the XSRF issues that you describe have little to do with shorteners. How often do you check the destination of a link by looking down at the status bar before you click it? A shortener does little to improve the success rate of an attack.

Easier still, an attacker can just get you to go to a page that has an img tag where the src is the XSRF URL.

dylan's picture

shorturl

shorturl looks excellent as well. As a matter of taste, I thought the /NNN numeric paths looked less strange than the encoded output of shorturl (which uses letters and digits). But shorturl definitely has some advantages, especially if you have a larger number of nodes.

shorturl also allows you to create redirects for any URL, even external sites - not just nodes. This isn't a feature we really needed but I can see the utility of it.

Nice work, D! Quick question

Nice work, D! Quick question though, what's the difference between this approach and simply setting pathauto to generate an alias with only the node ID?

dylan's picture

You get to have both!

With this method your pages have 2 URLs:
A "canonical" URL, which is the big SEO friendly pathauto version
A "shortlink", which redirects to the longer canonical URLs.

Also last I checked, the [nid] token didn't really work in pathauto because the alias is generated prior to saving the node, so the nid doesn't exist yet.

dylan's picture

definitely

Yes, this is really just a special case of what we have done with the "www" prefix. Just make sure the short domain redirects to your main site and everything should work fine.

joaquin's picture

Tow the line

<sigh>

I guess I need to walk the walk. i just like the immediate gratification and easy view of how many clicks came through bit.ly...

scott's picture

Can't you get the same thing from analytics?

Seems like you should be able to get the same information from our Google Analytics - that'll tell you what percentage of your traffic came through Twitter, Facebook, etc. Or are you looking for some other bit of information?

joaquin's picture

Why I Love Bit.ly

I love bit.ly because of the easy to read graphic display of how many people have clicked through on a particular link - which generally corresponds to what I send out via Twitter:

Screenshot of bit.ly after being logged in
In this way I can get a quick sense of what my follower care about, and what they don't. Can we reproduce that within Drupal?

Redirect module

I had coded the 'redirect to canonical URL feature' in redirect.module but at one point it didn't work, so I had commented it out. I went back today after reading this, tested it, and confirmed it's working again. I've also filed a patch for redirect.module to support the nid short-link redirect. http://drupal.org/node/933888

short url filter

Is there any way to have input format filter that automatically shortens drupal url's? like the link filter that automatically detects links.

When using notifications and messaging with drupal, most often a long url link like http://mysite.com/drupal/sites/default/files gets broken when it arrives as an email message.

dylan's picture

why not?

I don't see why not; take a look at hook_filter. You can also protect long links in e-mail by using <angle brackets> around the link.

Add new comment