I found the OpenAI page to be more interesting https://platform.openai.com/docs/guides/latency-optimization...
This is like the likely() and unlikely() macros in the Linux kernel! Huge speedup if you're right; small penalty if you're not.
[dead]
[flagged]
[dead]
If you use the Cursor IDE: the folks that wrote it talked about their use of speculative decoding to make "Apply" faster on the Lex Friedman podcast last month.
Here it is on YouTube, although you can also find it on Spotify and other podcast platforms:
https://youtu.be/oFfVt3S51T4?t=1206