Hacker News

LLM.int8(): 8-Bit Matrix Multiplication for Transformers at Scale

by ofirpresson 8/17/2022, 4:26:06 PM with 1 comment

by ofirpresson 8/17/2022, 4:26:07 PM
Cool new efficient inference method that saves 2x memory and does not degrade performance for large language models!
More from the author about this at: https://twitter.com/Tim_Dettmers/status/1559892888326049792