I'd wager that a latency of a couple hours is unacceptable for almost all TTS use cases.
Moreover, the current generation of TTS is pretty good and a lot of research is being done to improve it. You'd have a very finite amount of time to build your service and get users before the big players have got TTS that has caught up and doesn't have an enormous latency/require paying human wages.
Both Google and AWS have these APIs for pennies per minute. This market is going to be absolutely commoditized in no time. You thinking of using one of these APIs and slapping a front end on it?
> Cost would be something like $1 per 100 words.
A quick googling suggests that voice acting rates (pay to the voice actor alone) tend to be in the range of $1/second for short, small-market bits (short bits with larger markets tend to have higher use fees on top), so it sounds like this service relies on getting people willing to work on-demand for about 1/100 of market rates with a much faster turnaround time than is typical to have any room for profit
Sure, if you’ve got quality voice talent there's a huge demand for that. OTOH, if you don't have quality voice talent, why would people pay for this instead of today's commercially available machine TTS, which is much lower latency and much cheaper (e.g., Google with their premium WaveNet voices at $16/million characters, or something on the order of $1/8000 words.)