Prevent CDN from caching webpages to avoid duplicate content

Integrating a CDN to serve your images and static files is a great way to take load off your webserver and improve overall performance of your website. Unfortunately, it can also cause SEO issues such as duplicate content. Your webpages can be cached by the CDN and then possibly get included in Google's search index.

One way to avoid this is to use the .htaccess file to only allow php scripts (supporting dynamic image styles) and static files such as images or text files. The following example is for Drupal 7 websites using clean URLs.

  1. Create an alias for your website such as c.yourdomain.com that points to your website's primary hostname using DNS. Set this alias as the origin alias (the URL where it will retrieve files) in your CDN's configuration.
  2. Add the following lines to your .htaccess file somewhere after the "RewriteEngine on" line:
  3. # Rewrite rules for CDN so that it doesn't index any html pages. Only allow
    # images and the .php extension (for imagecache support).
    RewriteCond %{HTTP_HOST} ^c.yourdomain.com$
    RewriteCond %{REQUEST_URI} !\.(?i:php|txt|css|pdf|zip|gz|jpe?g|gif|png|bmp|ico|js)$
    RewriteRule .* http://yourdomain.com%{REQUEST_URI} [L,R=301]

If the request is not a direct .php request (for imagecache) or a static file listed, it will redirect the request to your website's URL and prevents the CDN from caching it.