I'm taking a DIY approach to RAG/function calling for a work tool. We're looking for data sovereignty, so we're probably going to self-host. To that end, I'm using Ollama to serve some models. If you want to do DIY I would highly recommend using NexusRaven for your function calling model.
No promises but I'm hopeful we can opensource our work eventually.
I used LangChain and models hosted on Ollama for my latest project [1]. Since I have a GPU now and Ollama is now available for Windows I can build LLM based applications quickly with local debugging.
I'm mainly hacking around with my LLM CLI tool, experimenting with different combinations of embedding models and LLMs: https://til.simonwillison.net/llms/embed-paragraphs#user-con...
I really need to add a web interface to that so it's a bit more accessible to people who don't live in the terminal!