Looks really interesting but its a shame that it's primarily built around text only. There has been a lot of research around multi-modal LLMs, any plans to support those?
Looks really interesting but its a shame that it's primarily built around text only. There has been a lot of research around multi-modal LLMs, any plans to support those?