Cool new efficient inference method that saves 2x memory and does not degrade performance for large language models!
More from the author about this at: https://twitter.com/Tim_Dettmers/status/1559892888326049792
Cool new efficient inference method that saves 2x memory and does not degrade performance for large language models!
More from the author about this at: https://twitter.com/Tim_Dettmers/status/1559892888326049792