Full-Site Delivery with Cloudfront

If you have a highly-trafficked site or want the best site-speed for global traffic, using a CDN such as Amazon's Cloudfront might be worth your time!

Personally, I wanted to investigate if I could save money by using a cheaper tier of hosting by letting Cloudfront handle most of my site's traffic.

After investigating, however I discovered some serious caveats when attempting to do more than serve static assets from Cloudfront.

Cloudfront's History

Cloudfront is a CDN designed for global content delivery. However, recent improvements allow it to act not only as a CDN, but also as a cache similar(ish) to Varnish.

People originally had to use Cloudfront by uploading their media to Amazon S3. Cloudfront then served those static assets from the S3 bucket. This means that within a coded application, developers needed to upload assets to S3 and then point URLs within their HTML (and perhaps in user-generated content) to a Cloudfront URL.

Eventually, people had the idea to also serve their static HTML files from S3 via Cloudfront as well. This gave them an easy "hosting" option which loaded quickly all over the world. In this scenario, no origin server was required - it all pulled from S3.

Note that the origin server is simply the server which has the original assets. Our web server example.com in this article is the origin server.

Now, due to fairly recent features being added to Cloudfront, we can use our own origin servers (other than S3) for assets! We can specify, for example, that all /js/*.js files from our website example.com are served from Amazon's CDN. This is known as origin pull.

This is similar to Varnish, where the headers returned from your origin server can determine how the files served are cached.

In this scenario, we're able to use the Cloudfront CDN URL for our assets within our HTML, but the assets still ultimately come from (and exist within) our origin server. There's no need to upload to an S3 bucket.

For instance, if you have a file example.com/css/styes.css, your Cloudfront URL abcxyz.cloudfront.net/css/styles.css will grab example.com/css/styes.css and then serve a cached version of that stylesheet for future visitors.

Within your own HTML, of course, you set the URL of styles.css to be your Cloudfront URL abcxyz.cloudfront.net/css/styles.css.

Whole-Site Delivery

Now a days, Cloudfront allows you to deliver your whole site! Essentially the Cloudfront CDN will suck in assets from your origin server as described above (origin pull).

While setting your entire site to be cached seems easier (you don't need to change the URL of assets in your HTML to a CDN URL!), there are a few very important considerations to make.

Cache Expiration in Cloudfront

Before you use Cloudfront for CDN delivery and caching, it's very important to know how HTTP caching works. While Cloudfront is a technically a CDN, it also effectively is a cache; It follows the HTTP spec on expiration caching.

When it comes to cache settings, Cloudfront gives you two options:

  1. You can specify how to cache files within Cloudfront configuration
  2. You can specify how to cache files from your origin server configuration

You can specify, for example, to cache files matching /js/*.js for 1 week, and files matching /images/*.jpg for 1 hour.

Alternatively, you can choose to have your origin server's cache settings be used by Cloudfront. This means the expiration headers returned from your origin server determine how long the file is cached before being refreshed.

You can also do a mix of the two approaches to maintain granular control over your cache settings.

For this article, I'm going to use the origin server settings to control my cache, as this is both good practice for server setup and gives me some more fine-grained control over cache settings/time. (I can output any cache expiration headers on my web server I need!).

Server Cache Settings

The first thing you need to do is decide how you want to set the caches on your origin server.

Assets that don't change much such as CSS, Javascript & layout images can have a long TTL (expiration time). This will tell browsers not to hit the origin server for a fresh copy of the resource too often.

Assets that may change often, such as the HTML of your sites home page, can have a shorter TTL. This means the cache won't go too long before refreshing itself. This is great as your home page content will expire more often.

Caching Static Files

When setting cache expiration times for static files on Apache or Nginx, I go directly to H5BP's server configuration repositories.

For Apache, you can set the expiration headers per file MIME type in your .htaccess or site configuration file.

For Nginx, you can include h5bp.conf, which in turn includes expires.conf. Similar to the Apache settings, the expires.conf file sets long expiration times for many file types.

Dynamic Content

For dynamic content, your server code can output expiration headers. For instance, on any article within fideloper.com (made with Laravel), I set the cache expiration age (TTL) to one day:

// Within a Controller method:

$response = Response::make($this->layout, 200);
$response->setCache([
    'last_modified' => DateTime::createFromFormat('Y-m-d G:i:s', $article->updated_at),
    'max_age' => 86400, // One day
    'public' => true,   // Allow public caches to cache
]);

return $response;

Between the server configuration settings and your code outputting cache headers, you can have full control over the cache settings Cloudront uses.

Setting Up Cloudfront

Now that our origin server's cache settings are in order, let's see how we might setup Cloudfront in order to cache our site.

Let's assume we want to serve example.com and www.example.com via Cloudfront. We want Cloudfront to act as a site-wide cache, similar to how you might use Varnish.

Create a Distribution

The first thing to do is to create a distribution. You can select if it's a "Download" or "Streaming" distribution. We want "Download", as we're not streaming video or similar files for the purposes of this article.

Some of the more important decisions you're asked to make when creating a distribution include:

  • Origin Domain Name - where Cloudfront can read your site. Note that it wants a domain and not an IP address. As you'll see later, this will be something like content.example.com.
  • Object Caching - You can select to use your origin server's cache headers, or tell Cloudfront to cache files in this distribution for n seconds. As noted, I choose to use the origins cache headers.
  • Forward Cookies - Whether or strip out or forward cookies when requesting from the origin server. You can select All, None, or Whitelist certain cookies.
  • Forward Query Strings - Whether to use query strings when contacting origin server. This should be off for static assets, but likely you'll want it on for caching any dynamically loaded pages (search results & pagination are two likely candidates).
  • Alternate Domains - Set what URL you want to access Cloudfront from. For whole site caching, this is your root domain (and probably the www version of it). That will be example.com and www.example.com in this example.
  • Buckets for Logging - You can select an S3 to dump access logs into.

DNS Settings

So, when we go to example.com or www.example.com, we want to get directed to Cloudfront rather than our origin server. This means that our domain needs to point to our Cloudfront URL, which will look something like abcxyz.cloudfront.net. Since we have a domain instead of an IP address to point to, we actually need to setup a CNAME instead of an A record for our root domain!

This brings up the first bit of trouble for full-site delivery.

Whether or not you can use a CNAME for your root domain might change from provider to provider. In fact, using a CNAME for your root domain is supposedly improper. However, the service DNS Made Simple has proprietary ANAME records for this purpose (~$30/year).

Tight integration with AWS's services gives you some benefits in this regard. I believe you can set up your domain more easily in this manner if you use Route 53, Amazon's DNS service.

Finally, we need a URL for our origin server that Cloudfront can access. This is needed because Cloudfront still needs to use our origin server to retrieve the assets before caching them. I setup a CNAME record such as content.example.com to point to our origin server. This is what you add to "Origin Domain Name" when creating a distribution.

Moar Caveats!

There are some more large caveats when delivering your whole site from Cloudfront!

Cookies

Whether or not the Cloudfront strips cookies when talking to your origin server is important. Request with unique cookies are cached separately, so each user effectively will have their own cached version of a page due to their unique session ID in the cookie. With user-specific cookies, Cloudfront effectively becomes a private cache, duplicating your browser's caching efforts.

Consider having Cloudfront ignore server-set cookies so it can cache content effectively. This won't affect cookies added client-side from services such as Google Analytics or Disqus comments. It will affect server logic if you rely on separating cookies/session IDs from guests and authenticated users.

HTTP Verbs

Cloudfront used to only support GET and HEAD requests. However, AWS recently announced support for POST, PUT, DELETE, OPTIONS, and PATCH!

You should read their documentation on Request and Response Behavior for Custom Origins to see if there are any other caveats for your use cases. For example, Cloudfront makes requests on your origin server as HTTP/1.0, but supports "most of the HTTP 1.1 specification".

Cost

Cloudfront can get really expensive if you get hammered by a large volume of requests, especially if they are made in a malicious manner. You effectively don't have control over blocking/mitigating (D)DoS attacks.

Is This Worth It?

Whether or not it's worth setting up full-site caching is a big question. Let's review what we have working against us:

  1. Domain issues - it's hard to use a CNAME for our root domain. You may need to use the "www" subdomain and some server logic to route all requests to the "www" version. Or worse, you may need to pay for a DNS service to allow you to use your root domain with Cloudfront
  2. Dynamic Content - Dynamic content is hard to work around. With Cloudfront's handling of Cookies (an issue with any cache), a decision on how users interact with your site needs to be made. You may need user-based cookies on many pages, making it hard to actually cache full-pages. Note that you can still get the benefit of the static assets being cached in any case.

Recommendation

At this point, I'd recommend using Cloudfront to serve your static assets for most sites. Configure your site to pull from either the Cloudfront URL or a subdomain such as cdn.example.com. Let your server handle the dynamic requests.

Other CDNs

I recommend looking into a CDN service such as Cloudflare, which has a free tier. They ask you to point your domain name servers to their DNS service to accomplish full-site delivery. This is similar to using Route 53 on AWS.

Akamai and other high-end CDN's also offer similar services.