I have been recently exploring and testing this approach of offloading the AI inference to the user instead of using a cloud API. There are many advantages and I can see how this could be the norm in the future.
Also, I was surprised by the amounts of people that have GPUs and how well SLMs perform in many cases, even those with just 1B parameters.
This is f* cool! Privacy is super important and writing+running code this way would help to mitigate some of the concerns about data access control; browsers can access whatever user's identity does, which is a great improvement over, ie, an approach where a centralized LLM (trained with unspecific/too specific data) is run for a whole company.