You can also find the same if you know the tokens/sec for different input and tokens variation.
In case you are interested to see results for speed ( tokens/second)
I Ran some tests between LLama2 7Bn, Gemma 7Bn, Mistral 7Bn to compare tokens/second on 6 different libraries with 5 different input tokens range (20 to 5000) and three different output tokens (100,200 and 500) on A100.
You can also find the same if you know the tokens/sec for different input and tokens variation.
In case you are interested to see results for speed ( tokens/second)
I Ran some tests between LLama2 7Bn, Gemma 7Bn, Mistral 7Bn to compare tokens/second on 6 different libraries with 5 different input tokens range (20 to 5000) and three different output tokens (100,200 and 500) on A100.
These are the results : https://inferless.com/learn/exploring-llms-speed-benchmarks-...