Advanced Cache Control (1.0.6)

Download OpenAPI specification:Download

This document describes the configuration format for CacheFly Advanced Cache Control.

Introduction

CacheFly Advanced Cache Control (ACC) is a service which improve the efficiency of delivering dynamic websites, where dynamic means that the website generates content on demand. The response can be any type of media in any format (including html, image, video, etc).

This is primarily achieved by allowing configuration of Caching Policy for specific paths and file extensions. Through this config you are able to significantly increase your cache-hit-ratio and provide a better experience to end users.

Script Config

Advanced Cache Control (ACC) is a script which needs to be configured to your specific use case.

CacheFly is responsible for the script.

You are responsible for the config (with our help, if you need).

Once the script is enabled for your account, you will have the option of creating a configuration file. This config is then associated with your services. This gives you full control of your service.

You may upload the configuration file to the CDN via the API or the Portal.

If you do not see any reference to this script in your account and would like to use it please contact us.

Config Format

The configuration file may be provided to us in either YAML or JSON format. We have supported both as a convenience.

We are aware that some automations find it easier to work with JSON, while those who write the config by hand may find it easier to work with YAML. Please pay attention to the MIME type while uploading the configuration.

All the configuration examples here are shown in YAML format. YAML is intended to be more human friendly than JSON. It is reasonably easy to follow (without any experience), but sometimes writing it takes a little getting used to. The main advantages of YAML are the lack of curly braces, the use of indentation to define the structure, and the ability to add comments.

If you are not familiar with YAML, there are many great resources online to help you get started.

Additionally a YAML aware IDE or editor will help you with authoring and modifying the configuration file.

The editor within our Portal is based on the very popular Visual Studio Code editor. It will show various warnings and errors as you type. You should ensure that all of these are addressed before saving the configuration.

However our Portal is intended to be useful and convenient (we're a CDN and not an IDE a company). As such we also wish to provide you with a list of other tools that our team has found useful;

If you require any support with authoring or modifying your configuration files, please contact our support team who will be very happy to assist.

Config Deployment

After uploading your configuration to us, our systems will deploy it to the CDN (almost) immediately.

When applying changes there may, or may not, be a propagation delay. This is to ensure that a high frequency of changes by a small number of users does not negatively impact the performance of the network as a whole. As such, although we endeavour to make this as fast as possible, and immediate in most cases, the only guarantee is that it will occur eventually.

Some limited validation checks are performed during upload. If they fail your new configuration will not be applied and the existing configuration will continue to be used. However there may be many small details that can not be checked automatically, and it is easy to break an in production system by introducing an error into the configuration.

We recommend that you:

  • Keep backups of your configuration files
  • Consider testing the configuration using a CacheFly service which is not serving your production traffic
  • Ensure that you are fully confident in the configuration before you upload it

It is possible for us to revert to a previous configuration if you have a problem. However this is a manual process and will take some time (several hours) to complete.

ACC Syntax

The Advanced Cache Control (ACC) configuration file contains two keys. Both need to be present for the configuration to be considered valid.


  1. The default key allows you specify the default behaviour of the CDN.

    Under this key the only valid key to specify is;

    • caching

  1. The exceptions key allows you specify exceptions to the default behaviour.

    Under this key you must specify a list (aka. sequence, or array).

    Each item in the list may contain the following keys:

    • path
    • extensions
    • caching

The caching key configures the Caching Policy that should be used. This is described in the Caching Policy section below.

The path and extensions keys are conditions which configure when that exception should be used. These are described in the Exception Matching section below.

Basic Example

default:               # configure the CDN default behaviour.
  caching:
    mode: respect-origin-assume-cache

exceptions:            # configure exceptions to the default.

  - path: "/images/"    # first exception
    caching:
      mode: ignore-origin-and-cache

See below for a more complete example.

Caching Policy

The caching policy is configured by selecting a mode and then tweaking the other available parameters. Where a value is not specified a default value will be used instead.

When defining the default caching policy, the default value is hard coded into the CDN logic. These defaults are documented within this document.

When defining the caching policy for an exception, the default is taken from the default policy specified in your configuration.

The following keys may be specified:

  • mode
  • ttl
  • maxTtl
  • varyByQuery
  • ignoredQueryParameters
  • varyByCookie
  • ignoredCookies

Mode

There are four caching modes.

  • never-cache

    This is intended for dynamically generated content which is different for every request.

    This mode will never cache the content. All requests are always forwarded to the origin. Request coalescing is disabled.

    Any cache control header sent by the origin is ignored. The CDN does not modify or add cache control headers when in this mode.

    If the origin is unavailable, then the content is unavailable.

  • ignore-origin-and-cache

    This is intended for content which is the same for every request and almost never changes (aka static content).

    This mode will always cache the content. Request coalescing is enabled.

    The cache control header sent by the origin is ignored. A CDN generated cache control header is placed on every response from the CDN.

    Requests are served from the cache whenever possible. If the origin is unavailable, then already cached content is still served by the CDN.

    You are expected to send a purge request if the content ever changes.

    For generated static content (such as the output of running webpack), it is recommended that you use file names which contain a content hash.

  • respect-origin-assume-cache

    This is intended for an origin which is well behaved and can generally be trusted to send sensible cache control headers.

    This mode will honor the cache control header sent by the origin.

    Request coalescing is enabled.

    If the origin does not send a cache control header, the content will be cached using the default ttl. The cached content will be served without a cache control header.

    When the origin indicates that the content is not cacheable, the CDN switches to the never-cache mode for this exact cache key for a period of a few minutes.

    NB. The cache key is affected by options described below.

  • respect-origin-assume-nocache

    This is intended for an origin which is well behaved and can generally be trusted to send sensible cache control headers.

    This mode will honor the cache control header sent by the origin.

    Request coalescing is enabled.

    If the origin does not send a cache control header, no caching will be performed.

    When it is determined that the content can not be cached, the CDN switches to the never-cache mode for this exact cache key for a period of a few minutes.

    NB. The cache key is affected by options described below.

Tabular Summary

  • A: never-cache
  • B: ignore-origin-and-cache
  • C: respect-origin-assume-cache
  • D: respect-origin-assume-nocache
Functionality A B C D
Allows caching No Yes Yes Yes
Origin cache control header Ignored Ignored Respected Respected
New cache control header No Yes No No
Request coalescing Never Always Usually Usually
Default behaviour ^ No cache Cache Cache No cache

^ When the origin does not send a cache control header.

Mode Example

default:        # configure default behaviour for the CDN
  caching:
    mode: respect-origin-assume-cache

TTL

The Time To Live (TTL) specifies how long the content should be kept in the CDN cache before checking with the origin for an updated version.

Note that the TTLs configured here only apply to successful responses. When an error is cached (such as a 404) a different TTL is applied.

The TTL may be specified as a number of seconds. Alternatively it can also be written as a number followed by a unit of time. Please see the below examples;

ttl duration duration in seconds
1 1 second 1
1s 1 second 1
1m 1 minute 60
1h 1 hour 3600
1d 1 day 86400
1w 1 week 604800
1y 1 year 31536000

Default TTL

When caching content, and the origin is not specifying a TTL (no cache control header), or where we're ignoring the origin (see modes); the TTL specified here will be used.

The default ttl is specified with the key ttl.

The default which is used when this is not specified is configured in Service Options under reverseProxy.ttl. By default this is set to 31 days (2678400 seconds).

TTL Example

default:        # configure default behaviour for the CDN
  caching:      # configure caching policy
    ttl: 1w     # configure the default TTL for cached content

Max TTL

** THIS FEATURE IS STILL IN DEVELOPMENT AND MAY CHANGE BEHAVIOUR **

The max TTL feature places a limit on how long the CDN caches the content but without modifying the cache control headers which are sent to the browser.

This is intended for large content which should be cached by the browser for a long period of time, but is unlikely to be requested frequently. This is ideal when access to the content is likely to be in large bursts of requests seperated by long periods of no requests (e.g. scheduled events).

When specified with a value greater than zero, the CDN will ensure that the content is not stored for longer than specified here. With a value of zero this feature is disabled.

The max ttl is specified with the key maxTtl.

Max TTL Example

default:        # configure default behaviour for the CDN
  caching:
    maxTtl: 3h  # This is a bad idea. Don't do this.

Query Parameters

With these settings you are able to include query parameters in the cache key.

All of these values default to "off". Additionally this completely overrides the boolean value in the Service Options under reverseProxy.cacheByQueryParam.

Vary By Query

The key varyByQuery may be specified as either a boolean or a list.


varyByQuery = false

When set to false, all query parameters are completely stripped from the request and not included in the cache key. The origin receives all requests without query parameters. This is the default behaviour.


varyByQuery = true / is a list

When set to true, all query parameters are included in the cache key.

When set to a list, only the query parameters mentioned in the list are included in the cache key.

In both cases all of the query parameters are allowed to pass through to the origin (when the request is not served from cache).

The origin must only use query parameters which have been added to the cache key to vary the response. Failure to follow this rule will lead to incorrect responses being served by the CDN.

If an empty list is defined, or if the list contains a wildcard (*) then it behaves the same as if it was set to true.


Example

default:        # configure default behaviour for the CDN
  caching:
    varyByQuery:  # Let the origin see the query parameters
      - page      # Add the parameter "page" to the cache key

Ignored Query Parameters

When varyByQuery is not set to false, the key ignoredQueryParameters can be used to specifically ignore certain query parameters.

This is intended to be used in the scenario where you want to specify "everything except bob" (for example). This is ideal for keeping UTM parameters out of the cache key (which will increase your cache hit ratio).

The key ignoredQueryParameters must be specified as a list (aka. sequence or array). An empty list has no meaning, and is the default.

The query parameters listed here are still visible to the origin (when the request is not served from cache).

Example

# Add all query parameters to the cache key, except for "bob".
default:
  caching:
    varyByQuery: true
    ignoredQueryParameters:
      - bob

UTM Example

default:
  caching:
    varyByQuery: true
    ignoredQueryParameters:
      - utm_source
      - utm_medium
      - utm_campaign
      - utm_term
      - utm_content

Cookies

The key varyByCookie may be specified as either a boolean or a list.


varyByCookie = false

When set to false, all cookies are completely stripped from the request and not included in the cache key. The origin receives all requests without cookies. This is the default behaviour.


varyByCookie = true / is a list

When set to true, all cookies are included in the cache key.

When set to a list, only the cookies mentioned in the list are included in the cache key.

In both cases all of the cookies are allowed to pass through to the origin (when the request is not served from cache).

The origin must only use cookies which have been added to the cache key to vary the response. Failure to follow this rule will lead to incorrect responses being served by the CDN.

If an empty list is defined, or if the list contains a wildcard (*) then it behaves the same as if it was set to true.


Example

default:        # configure default behaviour for the CDN
  caching:
    varyByCookie:  # Let the origin see the cookies
      - page      # Add the parameter "page" to the cache key

Ignored cookies

When varyByCookie is not set to false, the key ignoredCookieParameters can be used to specifically ignore certain cookies.

This is intended to be used in the scenario where you want to specify "everything except bob" (for example). This is ideal for keeping UTM parameters out of the cache key (which will increase your cache hit ratio).

The key ignoredCookieParameters must be specified as a list (aka. sequence or array). An empty list has no meaning, and is the default.

The cookies listed here are still visible to the origin (when the request is not served from cache).

Example

# Add all cookies to the cache key, except for "bob".
default:
  caching:
    varyByCookie: true
    ignoredCookieParameters:
      - bob

UTM Example

default:
  caching:
    varyByCookie: true
    ignoredCookieParameters:
      - utm_source
      - utm_medium
      - utm_campaign
      - utm_term
      - utm_content

Exception Matching

Exceptions have two conditions. Both need to be true for the exception to match. In the case of multiple exceptions a given request, the more specific should apply.

You should avoid writing a configuration where two exceptions overlap (i.e. both would match a request and neither is more specific).

However if this does occur then one of the two will be selected at random. This will produce inconsistent results when exposed to a large volume of traffic.

Paths

The path key is used to specify the path which must match for the exception to be used.

Paths always start with a /; this is also the default value.

Longer paths are considered to be more specific.

As standard paths match on a prefix basis;

  • /images/ matches /images/something.jpg.

To match on an exact basis only you add an $ to the end of the path (this is called the end of path anchor);

  • /images/$ matches only the exact path of /images/, meaning that /images/something.jpg will not match.

The intention is to allow you to specify a different Caching Policy for the html index of of the directory vs the contents within.

As noted above, there is the possibility of random selection when two exceptions match a request. To break the tie and prefer one other the other you can use # characters on the end of the path to make that exception appear to be more specific than it really is.

  • /abc### is more specific than /abc but is otherwise functionally identical.

When combing the anchor ($) with padding (#), the padding must come last.

  • /abc$### is valid, but /abc###$ is not valid.

Note that using padding is tricky to get correct as it may cause other exceptions to be overlooked. We advise that you create a second service to allow you to test your configuration before applying it to the service which is receiving production traffic.

Other characters which are not permitted within URLs may have special functions attached to them in the future. Ensure that you only specify valid characters in order to avoid unexpected behaviours. This includes (but is not limited to) the following characters; !, ", ', &, (, ), *, +, ,, ;, <, >, = and ?. If you need to include any of these within the path key in the exception please contact support who will be able to assist by configuring an override for your account.

File Extensions

The extensions key is used to match against the file extension if it is present in the request path.

This is a list (aka. sequence, or array) of file extensions, without the preceding dot.

The wildcard value * always matches. The wildcard is always considered to be the least specific match.

No value, or an empty list is treated the same as a list containing a single wildcard. Hence you may consider the wildcard to be the default value.

When the request does not contain a file extension it can only match the wildcard.

File Extensions Example

exceptions:

# First exception = matches various audio files
- path: /
  extensions:
    - mp3
    - ogg
    - aac
    - wav

# Second exception = matches everything else
- path: /
  extensions:
    - "*"

NB. In YAML the * character is special and must be placed inside quotes when used literally; i.e. "*" is necessary. Also worthy of noting is that because the sequence has only one item the alternative single line syntax may be preferable (to those familiar with it); i.e. extensions: ["*"]. The functionality is identical.

Example Configuration

In this example you can see the default behaviour of the CDN is to be respect-origin-assume-cache. Note that this is actually the default you get without Advanced Cache Control.

For all paths starting with /images/ the CDN has been instructed to switch to the mode ignore-origin-and-cache. This demonstrates how it is possible to ignore incorrect caching instructions being emitted from the origin.

For all paths starting with /invoices/ the CDN has been instructed to switch to the mode never-cache. Again this will ignore any incorrect caching instructions being emitted from the origin.

For all paths starting with /complex/ all of the various options have been given values. Explaining this exception has been left as an exercise for the reader.

YAML (with comments to explain things)

If you're new to YAML, learn the basics here.

---

# Everything on a line after a # is a comment. You can use
# comments them to leave little notes and reminders. This is
# especially useful when you're working with others people.

default:               # Configure the CDN default behaviour
  caching:
    mode: respect-origin-assume-cache

exceptions:            # Configure exceptions to the default

  - path: /images/          # First exception
    caching:
      mode: ignore-origin-and-cache
      ttl: 2592000

  - path: /invoices/        # Second exception
    caching:
      mode: never-cache

  - path: /complex/         # Third exception
    extensions:
      - "*"
    caching:
      mode: ignore-origin-and-cache
      ttl: 2592000
      maxTtl: 0           # Zero means disabled
      varyByQuery: true   # Boolean or a list
      ignoredQueryParameters: # List of parameters to ignore
        - something
      varyByCookie: true  # Boolean or a list
      ignoredCookies:     # List of cookies to ignore
        - csrftoken

YAML (without comments)

Although comments are great, sometimes they make things look more complex than they really are. Here is exactly the same config, still in YAML, just with all the comments and unnecessary whitespace removed.

---
default:
  caching:
    mode: respect-origin-assume-cache
exceptions:
  - path: "/images/"
    caching:
      mode: ignore-origin-and-cache
      ttl: 2592000
  - path: "/invoices/"
    caching:
      mode: never-cache
  - path: "/complex/"
    extensions:
      - "*"
    caching:
      mode: ignore-origin-and-cache
      ttl: 2592000
      maxTtl: 0
      varyByQuery: true
      ignoredQueryParameters:
        - something
      varyByCookie: true
      ignoredCookies:
        - csrftoken

JSON (with added whitespace)

If you're new to JSON, learn the basics here.

NB. The unnecessary whitespace in this JSON example is here to aid readability. JSON is intended to be processed by machines. The machines don't need the unnecessary whitespace, and JSON is handled more efficiently without it (in several ways). If you're making an integration with our API please do not include unnecessary whitespace in your JSON.

{
  "default": {
    "caching": {
      "mode": "respect-origin-assume-cache"
    }
  },
  "exceptions": [
    {
      "path": "/images/",
      "caching": {
        "mode": "ignore-origin-and-cache",
        "ttl": 2592000
      }
    },
    {
      "path": "/invoices/",
      "caching": {
        "mode": "never-cache"
      }
    },
    {
      "path": "/complex/",
      "extensions": [
        "*"
      ],
      "caching": null,
      "mode": "ignore-origin-and-cache",
      "ttl": 2592000,
      "maxTtl": 0,
      "varyByQuery": true,
      "ignoredQueryParameters": [
        "something"
      ],
      "varyByCookie": true,
      "ignoredCookies": [
        "csrftoken"
      ]
    }
  ]
}

Glossary

This documentation uses the following terms which we are aware may need clarification.

Origin

In the context of a Content Distribution Network (CDN), an "origin" refers to the original server or source from which the CDN retrieves the original web content, files, or data. This origin server typically hosts the original, master copies of web pages, images, videos, scripts, or any other content that needs to be distributed to users.

When a user requests content, the CDN's edge servers act as intermediaries. If the requested content is not already cached in the edge server, the CDN will fetch it from the origin server. This process is known as an "origin fetch."

Cache Control Header

The HTTP Cache-Control header is a fundamental mechanism used in HTTP (Hypertext Transfer Protocol) to control caching behavior, instructing how a response should be cached, stored, and used by both clients and intermediary caches (such as proxies and CDNs).

The primary purpose of the Cache-Control header is to improve web performance by managing how caches store and serve web content. It enables fine-grained control over caching, specifying directives that define caching policies and influence how long a response can be cached, whether it can be stored, and if it can be served from a cache without revalidating with the origin server.

Please see the MDN documentation for further details.

Cache Key

A cache key is a unique identifier or string used within a caching system to associate a specific piece of content with its corresponding cached version. It allows the caching system to quickly locate, retrieve, and serve cached content without having to reprocess the original request or access the origin server.

Each element of the request which is used to vary the response from the origin must be incorporated into the cache key. This way when the CDN computes the cache key for a request it can be confident that if it finds content stored under that key, it is the correct content.

By default the cache key is the request path.

You can use the options above to incorporate Query Parameters and Cookies in to the cache key.

When your origin outputs a Vary header then an additional layer of indirection may be added dynamically creating what are essentially "sub cache keys". This usually works but is not ideal. For best performance each header listed in the Vary header from the origin should be incorporated into the base cache key.

Moreover, there are specific options for each service, like considering if a browser supports gzip encoding. It's important to tailor the cache key thoughtfully; for instance, you wouldn't want to respond with content in gzip format for a browser that doesn't support it. Including this in the cache key needs careful consideration, especially if the original source can't produce content in that format, as it could lead to inefficiencies in storage, cache usage, and reliance on the original source.

Request Coalescing

Request coalescing is a technique used to optimize the delivery of content by consolidating multiple requests for resources into a single request.

When there are multiple requests for the same resource at roughly the same time (within the same storage region), only a single request is sent to the origin to retrieve the resource. When the resource has been retrieved, all of the waiting requests are fulfilled simultaneously.

The above video provides a quick explanation of how request coalescing works behind the scenes, and when it should (or should not) be used.