Hacker News

Streaming Messages from Kafka into Redshift in Near Real-Time

by shazelineon 10/17/2016, 8:39:28 PM with 6 comments

by prewetton 10/18/2016, 4:23:33 AM
The proliferation of software using words with a strong, precise, pre-existing meaning is making some of these headlines difficult to read... My first impression was that there is a space telescope I was unaware of whose copious data was being converted into redshift measurements of galaxies. Sadly, it has nothing to do with space news. Not sure whether to laugh or sigh.
by vgton 10/18/2016, 3:51:10 PM
Google BigQuery has a Streaming API specifically for this reason. Up to 100,000 rows per second per table, available immediately for analysis. Interestingly, with BigQuery batch or stream ingest uses different resources than query, so your query performance doesn't degrade due to ingest.
(Work on Google Cloud)
by woodcuton 10/18/2016, 8:59:45 AM
How would redshift compare to Yandexs' Clickhouse[1] for this kind of architecture?
[1] https://clickhouse.yandex/
by jack9on 10/18/2016, 2:46:35 AM
Cut out Kafka by writing directly to S3 and bulk loading from S3 directory (optimal for Redshift). The article never details what "near real-time" means, which is bothersome.
by juskreyon 10/18/2016, 2:21:02 AM
With a rate of 300 small messages a second it was enough for me to write in 10k batches. Had a small writer script that kept the buffer in memory and did batch insert.