Exclude crawlers server-wide with X-Robots-Tag

For a staging site, it's important to exclude crawlers. You wouldn't want your content to get indexed at the wrong URL! The conventional wisdom is to use HTTP Basic authentication.

Metal Toad Staff

Jun 1, 2011

For a staging site, it's important to exclude crawlers. You wouldn't want your content to get indexed at the wrong URL! The conventional wisdom is to use HTTP Basic authentication. There are some disadvantages to this approach however, and I've found I prefer using a new HTTP header called X-Robots-Tag. Note that this assumes your only objective is to prevent indexing by benevolent crawlers. If you do need to keep secrets this method is obviously unsuitable.

Disadvantages of HTTP Basic

It confuses users - you can't log out; you can't even tell if a request is authenticated or not.
It prevents testing third-party apps that request resources from your server (sure you could whitelist them, but see the next point).
It's just plain annoying.

Advantages of X-Robots-Tag

You don't have to modify or redirect robots.txt (useful if your application controls robots.txt, and you want to retain the ability to test on staging).
Unlike <meta name="robots"...> it works for all file types, not just HTML.
It's easy to add the header in the global Apache httpd.conf.

<Directory />
  # Globally disallow robots from the development sever
  Header Set X-Robots-Tag "noindex, noarchive, nosnippet"
</Directory>

Exclude crawlers server-wide with X-Robots-Tag

Disadvantages of HTTP Basic

Advantages of X-Robots-Tag

Similar posts

How to use ChatGPT

URL Shorteners Must Die

Getting started on displaying videos with the Brightcove iPhone SDK

Exclude crawlers server-wide with X-Robots-Tag

Disadvantages of HTTP Basic

Advantages of X-Robots-Tag

Similar posts

How to use ChatGPT

URL Shorteners Must Die

Getting started on displaying videos with the Brightcove iPhone SDK

Get notified on new marketing insights