One of the early proposed solutions for this was the SRV DNS record, which was similar to the MX record, but for every service, not just e-mail. With MX and SRV records, you can specify a list of servers with associated priority for clients to try. SRV also had an extra “weight” parameter to facilitate load balancing. However, SRV did not want the political fight of effectively hijacking every standard protocol to force all clients of every protocol to also check SRV records, so they specified that SRV should only be used by a client if the standard for that protocol explicitly specifies the use of SRV records. This technically prohibited HTTP clients from using SRV. Also, when the HTTP/2 (and later) HTTP standards were being written, bogus arguments from Google (and others) prevented the new HTTP protocols from specifying SRV. SRV seems to be effectively dead for new development, only used by some older standards.
The new solution for load balancing seems to be the new HTTPS and SVCB DNS records. As I understand it, they are standardized by people wanting to add extra parameters to the DNS in order to to jump-start the TLS1.3 handshake, thereby making fewer roundtrips. (The SVCB record type is the same as HTTPS, but generalized like SRV.) The HTTPS and SVCB DNS record types both have the priority parameter from the SRV and MX record types, but HTTPS/SVCB lack the weight parameter from SRV. The standards have been published, and support seem to have been done in some browsers, but not all have enabled it. We will see what browsers will actually do in the near future.
DNS load balancing has some really nasty edge cases. I have had to deal with golang HTTP2 clients using RR DNS and it has caused issues.
Golang HTTP2 clients will reuse the first server they can connect to over and over and the DNS is never re-resolved. This can lead to issues where clients will not discover new servers which are added to the pool.
An particularly pathological case is if all serving backends go down the clients will all pin to the first serving backend which comes up and they will not move off. As other servers come up few clients will connect since they are already connected to the first server which came back.
A similar issue happens with grpc-go. The grpc DNS resolver will only re-resolve when the connection to a backend is broken. Similarly grpc clients can all gang onto a host and never move off. There are suggestions that on the server side you can set `MAX_CONNECTION_AGE` which will periodically disconnect clients after a while which causes the client to re-resolve the DNS.
I really wish there was a better standard solution for service discovery. I guess the best you can do is implement a request based load balancer with a virtual IP and have the load balancer perform health checks. But you are still kicking the can down the road as you are just pushing down the problem to the system which implements virtual IPs. I guess you assume that the routing system is relatively static compared to the backends and that is where the benefits come in.
I'm curious how do people do this on bare metal? I know AWS/GCP/etc... have their internal load balancers, but I am kind of curious what the secret sauce is to doing this. Maybe suggestions on blog posts or white papers?
> So what happens when one of the servers is offline? Say I stop the US server:
> service nginx stop
But that's not how you should test this. A client will see the connection being refused, and go on to the next IP. But in practice, a server may not respond at all, or accept the connection and then go silent.
Now you're dependent on client timeouts, and round robin DNS will suddenly look a whole lot less attractive to increase reliability.
> As you can see, all clients correctly detect it and choose an alternative server.
This is the nasty key point. The reliability is decided client-side.
For example, systemd-resolved at times enacted maximum technical correctness by always returning the lowest IP address. After all, DNS-RR is not well-defined, so always returning the lowest IPs is not wrong. It got changed after some riots, but as far as I know, Debian 11 is stuck with that behavior, or was for a long time.
Or, I deal with many applications with shitty or no retry behavior. They go "Oh no, I have one connection refused, gotta cancel everything, shutdown, never try again". So now 20% - 30% of all requests die in a fire.
It's an acceptable solution if you have nothing else. As the article notices, if you have quality HTTP clients with a few retries configured on them (like browsers), DNS-RR is fine to find an actual load balancer with health checks and everything, which can provide a 100% success rate.
But DNS-RR is no loadbalancer and loadbalancers are better.
> This allows you to share the load between multiple servers, as well as to automatically detect which servers are offline and choose the online ones.
To [hesitantly] clarify a pedantry regarding "DNS automatic offline detection":
Out of the box, RR-DNS is only good for load balancing.
Nothing automatic happens on the availability state detection front unless you build smarts into the client. TFA introduction does sort of mention this, but it took me several re-reads of the intro to get their meaning (which to be fair could be a PEBKAC). Then I read the rest of TFA, which is all about the smarts.
If the 1/N server record selected by your browser ends up being unavailable, no automatic recovery / retry occurs at the protocol level.
p.s. "Related fun": Don't forget about Java's DNS TTL [1] and `.equals()' [2] behaviors.
[1] https://stackoverflow.com/questions/1256556/how-to-make-java...
[2] https://news.ycombinator.com/item?id=21765788 (5y ago, 168 comments)
> "It's an amazingly simple and elegant solution that avoids using Load Balancers."
When a server is down, you have a globally distributed / cached IP address that you can't prevent people from hitting.https://www.cloudflare.com/learning/dns/glossary/round-robin...
Hey. This is Cloudflare's CTO. We've rolled out a change to all free accounts in Cloudflare to bring them into line with paid accounts. The problem you are talking about here has been fixed and we should be doing Zero Downtime Failover for all account types. Can you retest it?
PS Thanks for writing this up. Glad we were able to change this behaviour for everyone.
The dark remix version of this is fast flux hosting and what a lot of the bulletproof hosting providers use.
May be worth mentioning Zero downtime failover is a Pro or higher feature I believe, that's how it was documented before as well, back when protect your origin server docs were split by plan level. So you may see different behavior/retries.
Multiple A records is not for load balancing, a key component of which is full control over registering new targets and deregistering old targets in order to shift traffic. Because DNS responses are cached, you can't reliably use DNS to quickly shift traffic to new IP addresses, or use DNS to remove traffic from old IP addresses.
As OP clearly shows, it's also not useful for geographically routing traffic to the nearest endpoint. Clients are dumb and may do things against their interest, the user will suffer for it, and you will get the complaints. Use a DNS provider with proper georouting if this is important to you.
The only genuinely valid reason for multiple A addresses is redundancy. If you have a physical NIC, guess what, those fail sometimes. If you get a virtual IP address from a cloud provider, guess what, those abstractions leak sometimes. Setting up multiple servers with multiple NICs per server and multiple A records pointing to those NICs is one of those things you do when your usecase requires some stratospherically high reliability SLA and you systematically start to work through every last single point of failure in your hot path.
We used to do this at Amazon in the 00's for onsite hosts. At the time round robin DNS was the fastest way to load balance as even with dedicated load balancers of the time, the latency was a few milliseconds slower. A lot of the decisions didn't make sense to me and seemed to be grandfathered in from the 90's.
We had a dedicated DNS host and various other dedicated hosts for various services related to order fulfillment. A batch job would be downloaded in the morning to the order server (app) and split up amongst the symbol scanners which ran basic terminals. To keep latency as low as possible the scanners would dns round robin. I'm not sure how much that helped because the wifi was by far the biggest bottleneck simply for the fact of interference, reflection and so on.
With this setup an outage would have no effect the throughput of the warehouse since the batch job was all handled locally. As we moved toward same day shipping of course this was no longer a good solution and we moved to redundant, dedicated fiber and cellular data backup then almost completely remote servers for everything but app servers. So what we were left with was million dollars hvac to cool a quarter rack of hardware and a bunch of redundant onsite tech workers.
The browser behavior is really nice, good to know that it falls back quickly and smoothly. Round robin DNS has always been referred to as a "poor mans load balancer" which it seems to be living up to.
> Curl also works correctly. First time it might not, but if you run the command twice, it always corrects to the nearest server.
This took two tries for me, which begs the question how curl is keeping track of RTT (round trip times), interesting.
Interesting. The author starts by discussing DNS round robin but then briefly touches on Cloudflare Load Balancing.
I use this feature, and there are options to control Affinity, Geolocation and others. I don't see this discussed in the article, so I'm not sure why Cloudflare load balancing is mentioned if the author does not test the whole thing.
Their Cloudflare wishlist includes "Offline servers should be detected."
This is also interesting because when creating a Cloudflare load balancing configuration, you create monitors, and if one is down, Cloudflare will automatically switch to other origin servers.
These screenshots show what I see on my Load Balancing configuration options:
https://cdn.geekzone.co.nz/imagessubs/62250c035c074a1ee6e986...
https://cdn.geekzone.co.nz/imagessubs/04654d4cdda2d6d1976f86...
> Curl also works correctly. First time it might not, but if you run the command twice, it always corrects to the nearest server.
I always assumed curl was stateless between invocations. What's going on here?
Interesting topic for me, and I’ve been looking at anycast IP services and latency based DNS resolvers as well. I even made a repo[1] for anyone interested in a quick start for setting up AWS global accelerator.
[1] https://github.com/mlhpdx/cloudformation-examples/tree/maste...
Hm, I thought Happy Eyeballs (HE) was mainly concerned with IPv6 issues and falling back to IPV4. I didn't think it was this RFC in which finally some words were said about round-robin specifically, but it looks like it was (from this article).
Is it true then that before HE, most round-robin implementations simply cycled and no one considered latency? That's a very surprising finding.
Another way to solve for clients that stick with an IP after resolving is to use a combination of DNS RR and Anycast (if you have control over the physical infra). That means you resolve with RR to an IP in the regional data center and then use Anycast for local delivery. That way if the server goes down these clients can continue to operate.
Take a look at SRV records instead - they are very intentionally designed for this, and behave vaguely similarly to MX. Creating a DNS server (or a CoreDNS/whatever module) that dynamically updates weights based on backend metrics has been a pending pet project of mine for some time now.
I actually use round robin into a set of ssh servers.
There is never a delay if one of them is down.
I am using a closed-source client (Bluezone Rocket), but I'm assuming that it pulled a lot of code from PuTTY as it uses the PPK format.
Check out what happens when you use IPv6 addresses. RFC 6724 is awkward about ordering with IPv6.
How your OS sorts DNS responses also comes in to play. Depends on what your browser makes DNS requests.
Cloudflare results with worker as a reverse proxy can be much better.
hey! so i got a cdn for video made of 4 bare metals and 2 are newer and more powerful so i give them each 2 ip addresses from the 6 addresses replied by dns for the respective a record. but from a very diverse pool of devices (proprietary set top boxes, smart tv sets, mobile clients ios and android, web browsers, etc) i still get ~40% of traffic on the older servers instead of the expected 33% given 2 out of 6 ip addresses resolved as dns a records for these hosts. why?
What a great article! It’s often easy to forget just how flexible and self-correcting the “official” network protocols are. Thanks to the author for putting in the legwork.
"I wrote a decoder in Perl. Everything must be in Perl."
preach on.
I have use round robin for years.
Wish I could add instructions like:
- random choice #round robin, like now
- first response # usually connects to closest server
- weights (1.0.0.1:40%; 2.0.0.2:60%)
- failover: (quick | never)
- etc: naming countries, continents
Back in the day DNS consumed a lot more oxygen - Bind, double-reverse mx records, windows dns, etc. What happened? Did cloud make all of that go away?
As SRE, I get a chuckle out of this article and some of the responses. Devs mess this up constantly.
DNS has one job. Hostname -> IP. Nothing further. You can mess with it on server side like checking to see if HTTP server is up before delivering the IP but once IP is given, the client takes over and DNS can do nothing further so behavior will be wildly inconsistent IME.
Assuming DNS RR is standard where Hostname returns multiple IPs, then it's only useful for load balancing in similar latency datacenters. If you want fancy stuff like geographic load balancing or health checks, you need fancy DNS server but at end of day, you should only return single IP so client will target the endpoint you want them to connect to.
Chrome and Firefox use the OS dns server by default, which in most OS' have caching as well.
did you try running a simple bash curl loop instead of manually printing. The data and statistics will be become exactly clear. Because i want to understand how to ensure my clients get the nearest edge data center
This seems like a nice solution for zero-downtime updates. Clone the server, add a the specified ip, deny access to the main one, upgrade and turn the cloned server off.
37signals/Basecamp wrote about something similar on their blog, they saw traffic switching almost immediately: https://signalvnoise.com/posts/3857-when-disaster-strikes and in their comments they said it was hinted that it was just a DNS update with low TTLs.
round robin ≠load balancer
but please do continue reading on…
So half of your content is served from another server? Sounds like a recipe for inconsistent states.
[dead]
[dead]
[flagged]
Hmm. I've asked the authoritative DNS team to explain what's happening here. I'll let HN know when I get an authoritative answer. It's been a few years since I looked at the code and a whole bunch of people keep changing it :-)
My suspicion is that this is to do with the fact that we want to keep affinity between the client IP and a backend server (which OP mentions in their blog). And the question is "do you break that affinity if the backend server goes down?" But I'll reply to my own comment when I know more.