If you have a modern multicore processor, you should be able to run just about any model if you have enough memory for it. Phi-3 fits well into the 4gb Raspberry Pi memory profile when you quantize it, but you may want an even smaller model if you prefer speed over quality.
> around how many TOPS are necessary and/or sufficient to run local AI, whether it's vision, audio, or NLP, to have good-enough speed?
These are kinda individual questions.
- Vision is relatively compute-intensive and depends on how many frames-per-second of image detection you need
- Audio is relatively low-latency if you use a smaller whisper model
- NLP/LLMs tend to be slow, but can stream their output which makes it possible to speak answers while they're being generated
So putting this all together, you really have to define what kind of end-product you want. Making matters worse, all TOPS are not created equal; having a 40 TOPS NPU that doesn't support every Tensorflow operation could hamstring your ability to run everything on one machine. Past a certain price point (like the Snapdragon X Elite) it really just makes more sense to buy a 12gb 3060 and a cheap barebones desktop system to run it in.
If you have a modern multicore processor, you should be able to run just about any model if you have enough memory for it. Phi-3 fits well into the 4gb Raspberry Pi memory profile when you quantize it, but you may want an even smaller model if you prefer speed over quality.
> around how many TOPS are necessary and/or sufficient to run local AI, whether it's vision, audio, or NLP, to have good-enough speed?
These are kinda individual questions.
- Vision is relatively compute-intensive and depends on how many frames-per-second of image detection you need
- Audio is relatively low-latency if you use a smaller whisper model
- NLP/LLMs tend to be slow, but can stream their output which makes it possible to speak answers while they're being generated
So putting this all together, you really have to define what kind of end-product you want. Making matters worse, all TOPS are not created equal; having a 40 TOPS NPU that doesn't support every Tensorflow operation could hamstring your ability to run everything on one machine. Past a certain price point (like the Snapdragon X Elite) it really just makes more sense to buy a 12gb 3060 and a cheap barebones desktop system to run it in.