Ask HN: How would you create a 99.9999% SLA static site?

  • The key point here would be how you are measuring the downtime and what that means. From a technical perspective it's easy as there's many CDNs that can deliver that kind of reliability very inexpensively. So I view this as less a technical exercise and more a contractual one.

    You'd need to determine very precisely how you measure the SLA, if there's any exclusion periods (maintenance), and what the penalty is for violation. With an SLA like that the site simply being slow to load may constitute a violation.

  • 3 different DNS providers: route 53, google cloud, azure. Take 2 nameservers from each service. Domains have a maximum of 6 nameserver records.

    Then take each of the cloud providers' static hosting. I.e. s3, azure storage, google cloud storage bucket and host content there. DNS queries can be setup to round robin between all the hosts. Probably a nightmare to maintain.

  • Static site is not the problem. We all know that. It is the infrastructure that you have to setup. The moment you use a DNS server, a web server, CDN etc, you are now dependent on their SLAs. So I don't know if you really want to build your own DNS server etc.

    I would just get may be 3 nice dedicated servers from a provider I trust (either colocate or lease) and then slap nginx in front with may be nginx as a load balancer as well spread across the 3 servers. I can also add something like varnish cache. That way, you are reducing 3rd party dependencies and reducing the added risk (as opposed to using things like other DNS servers, cloudflare, CDN etc). For 3000 concurrent users on a static HTMl site, it should be a breeze.

    But you are still still dependent on the domain registrar's nameservers and possibly your own DNS server if you setup one.

  • The first question here is of course how you define the uptime of the site and how you measure your 99.9999 % SLA? Agreeing to a maximum downtime of 2 seconds per month is different from agreeing to 30 seconds per year, or 5 minutes over ten years. A simple DNS glitch for example might take a site offline for 2 seconds and might violate the first agreement but not the others.

    Then how do you measure uptime? Up for whom? And where does the responsibility of the company end? If someone cuts a subsea Internet cable and takes a country offline (it happened before), does that violate the SLA? Depending on where the responsibility of the company ends the problem will become much more or much less challenging.

    These points aside, I think if you want to build a very stable static site the most important ingredient is redundancy. I'd argue using a single CDN or cloud provider might give you good uptime but it will not ensure your downtime stays below 2 seconds per month, as a single outage of that provider will already be enough to break the SLA (and even CDNs like Cloudflare have outages).

    What you could try is the following: Get your own IP space e.g. from RIPE or APNIC. Make deals with several hosting companies to announce that address space. Use several /24 subnets, announce each one via several of those companies. Create multiple DNS A records for all these subnets that your site's domain resolves to. Set the TTL of these records as high as possible to make sure recursive resolvers have them cached (and make sure your DNS servers won't be down for longer than your TTL). Put copies of your site on servers at all the hosting companies from where you announce your IPs.

    I'm not a BGP expert so I don't know how fast routes will adapt to a single data center or server being down, if you have multiple A records most browsers should try them one by one if until one works (but does that count if it takes longer than 2 seconds?), so in theory there should be a good chance of your site staying up even if one of your data centers goes down.

    Then again this assumes that you have everything on the server side perfectly under control. In my experience most of the system outages are caused by human error so the chance that you will destroy your own system by e.g. misconfiguring it is higher than that one of your data centers will be taken out by a nuke ;)

    We're in the process of building such an anycast network and it's absolutely doable. I'll let you know if we reach 99.9999 % uptime.

  • The weak link is DNS and routing (e.g. a BGP issue from a third party you don’t control) in this scenario. Using two or three CDNs, you should be good handle hosting easily for static assets.

    A BGP issue might bring down the website for a country or a city or a continent for 5 hours.

    That is not something you can control.

    Same goes with DNS, there is caching, probably a load balancer might get updated at some point and a PHP app that has cached the DNS might miss a hit or two before invalidating.

    It’s an interesting exercise.

  • Perhaps something like IPFS or Dat? If the site gets more than a few visitors per day it should remain cached.

  • My best approach for this type of client is: Route 53 DNS+CloudFront+S3 static site hosting.

    Because: you can route request from dns to CloudFront, then to s3 static site. And get your content cached/distributed automatic over the globe.

    That’s is, super cheap and with a nice AWS SLA, and you can setup all with tls too.

  • The secret is as much static content as possible with as much redundancy as you can afford.

    I took a site that had NO nines (less than 88% uptime) and up to 99.995 in 3 months, but that had dynamic content. Contact me - HN AT free DOT TV - I guarantee uptime

  • Self host it for sure. My internet rarely gÌ·o̶e̶s̸ Ì´d̸oÌ´w̵n̸ Ì·a̵n̶d̸ it's pretty fast, too. I wÌ·oÌ·uÌ·l̶dÌ´n̸'̸t̵ Ì´t̸rÌ´uÌ·s̵tÌ· ̶a̵ ̶t̵h̶i̵r̸dÌ´ Ì´p̵a̸rÌ´t̸yÌ´ ̸t̵o̶ ̶

  • What’s the SLA penalty? The devil is in the details of the actual terms of the agreement.

  • IPFS for free decentralized hosting or permaweb network by Arweave.

  • AWS Lambda or Cloudflare Workers. I'll do it for $10k.