Serving static files with Django and AWS - going fast on a budget

  • Web servers such as lighttpd and Apache are blazingly fast at serving static pages. Reading a file from disk and sending its contents back to the browser is as simple as it gets.

    Except that it's not. One of the reason the classic web servers are so amazingly fast with static files is that they, along with the operating systems they run on, have spend significant time trying to optimize this process. For linux, for example, look at the sendfile(2) system call, or the TCP_CORK socket option. These were expressly designed to permit a userspace program to get a file through the kernel and onto the wire with as few CPU cycles and memory copies as possible.

    One of the most frustrating things about the Web 2.0 crowd (from the perspective of curmudgeons like me) is that they really don't have a clue about any of this complexity. They just figure that they'll stuff everything into a DJango/Rails/whatever request and scale up later. Then when they run into trouble, they end up turning to tools like Apache as black boxes and designing Rube Goldberg apparatii around them when they really should be looking at the problem more directly.

    Really, folks: those low level APIs are your friends. They're not nearly as scary as they look. Even if you end up with an off-the-shelf solution, knowledge of this stuff can only be good for you.

  • 0. Make sure your HTTP responses return the correct value in the Vary header. Otherwise, everything else will fall apart.

    1. Ensure that all your HTTP responses have proper ETags, and make sure you process the If-Match and If-None-Match request headers appropriately to avoid doing unnecessary work.

    2. Put a simple caching reverse proxy (e.g. Squid, Varnish, mod_disk_cache) in front of your application. Then, tune the cache-control directives to allow the cache to return cached responses without hitting the back end. For this to work for HTTPS responses, you need to put a HTTP-to-HTTPS proxy in front of the caching proxy.

    3. Add a system like the one described in the article, that immediately purges entries from the cache when they are updated in the back-end.

    4. Purchase a caching, SSL-enabled, load-balancer appliance (e.g. Big-IP) that is built to do all of the above nearly automatically.

    Most people never need to go beyond step #2.

  • Being into Django+AWS development at the moment (and using http://eucalyptus.cs.ucsb.edu/ for a private cloud with AWS as an extension as AWS is not cost effective in many ways), I find this a good article to improve performance in a basic way. Many people seem to forget not all content needs to be dynamic and some basic modification can seriously improve performance.

    It is great to see articles that include code snippets and architecture diagrams. People should use direct example more often instead of trying to describe things in words.

    Lastly, I'd suggest using nginx (http://wiki.codemongers.com/Main) as I found it to be a serious improvement on lighttp for many of my projects.