In our experience of ELB/SQS, AWS has done a good job on these services. A few things we noted:
- ELB takes about ~15 minutes to scale out completely when you hit traffic spikes.
- SQS is simple to use, rather cost-effective and quite reliable. It will add a bit of latency, but as the author notes, this can't replace Kinesis.
- The redrive policy on SQS is really nice to catch corner cases.
> It seems to have a problem with handling more than 2.3K message insert/sec.
We've written to SQS at rates several times this using only 3 EC2 nodes without issue.
The only issue we have seen is occasional and sporadic latency issues reading from SQS which causes the queues to backup a bit but it generally clears within 30 minutes or so.
Really good write and useful for a project I am working on!
> It's cost effective and reliable.
It is neither of those things. My tests with different SDKs have ruled out SQS (which we basically benchmark monthly).
You would do much better with a ZeroMQ+Redis custom solution (over 100k msgs/sec) or RabbitMQ or pretty much anything else. These SQS numbers are embarrassingly bad. At any scale, it's expensive and subject to strange IO spikes. Just don't use it (which is a broad statement) unless you really need to tie-in to something like Lamda (processing pipeline that can be fed from SQS). Even then, it's not cost effective.
There's some confusion in this post about the testing methodology vs. the implementation of SQS itself.
SQS itself may well be implemented on EC2 instances with auto-scaling (though I doubt it), but that's irrelevant to the test harness, which needs to scale out the number of nodes pounding on SQS.
The important bit is that SQS doesn't exhibit any of the scaling bumpiness of ASGs. It's a region-wide service, which you'd be very lucky to break on your own.