Hacker News

AMD Open-Source 1B OLMo Language Models

by ebaliton 11/1/2024, 10:50:39 PM with 4 comments

by duchenneon 11/2/2024, 2:23:18 AM
Training a 1B model on 1T tokens is cheaper than people might think. A H100 GPU can be rented for 2.5$ per hour and can train around 63k tokens per second for a 1B model. So you would need around 4,400 hours of GPU training costing only $11k And costs will keep going down.
by throwaway888abcon 11/1/2024, 11:13:28 PM
"Furthermore, AMD OLMo models were also able to run inference on AMD Ryzen™ AI PCs that are equipped with Neural Processing Units (NPUs). Developers can easily run Generative AI models locally by utilizing the AMD Ryzen™ AI Software."
Hope these AI PCs will run also something better than 1B model.
What is it useful for ? Spellcheck ?
by sireaton 11/2/2024, 10:09:00 PM
Baby steps, but how useful is a 1B model these days?
It seems actual domain specific usefulness (say specific programming language, translation, etc) starts at 3B models.
by adton 11/2/2024, 4:44:24 AM
https://lifearchitect.ai/models-table/