Looks like this will only be effective for short prompts/responses, eg if you have 4k tokens in your prompt, you can only fire 10 requests/minute with 40k token/minute rate limit
Why not using just a fetch with retry function ?
Looks like this will only be effective for short prompts/responses, eg if you have 4k tokens in your prompt, you can only fire 10 requests/minute with 40k token/minute rate limit