Ask HN: How to increase LLM inference speed?

  • you need a faster GPU but that only works for self hosted LLMs (ie: ollama/huggingface)