Golang or Python for the large scale distributed crawler?

  • Are you using 100 threads or coroutines? because 100 threads on a single machine will no give you nice performance at all.

    I have started working a crawler in Go and each website to crawl can be configured via YAML files. The quantity of code written is a bit bigger than the python one made before, but a single micro instance on amazon is way enough to "read" few hundred websites every hour. Personally, I think the goroutine system is way easier to deal with compared to the Python one.

    So if I was you, I would definitely give the IO processing to golang, and let python deal with the data processing.