I found Kiln a few months ago while looking for a UI to help build a dataset for fine-tuning a model on Grapheme-to-Phoneme (G2P) conversion. I’ve contributed to the repo since.
In my G2P task, smaller models were splitting phonemes inconsistently, which broke downstream tasks and caused a lot of retries - and higher costs. I fine-tuned Gemini, GPT-4o-mini, and some LLaMA and Qwen models on Fireworks.ai using Kiln, and it actually helped reduce those inconsistencies
Naive question, are there good tutorials/places that teach us to implement RAG and fine tune a model? I don't know if it's even feasible. At the moment I create AI workflows for the company I work at to (semi-)automate certain things. But it's not like I could fine-tune Claude. I'd need my own model for that. But would I need a whole GPU cluster? Or could it be done more easily.
And what about RAG? Is it hard to create embeddings?
I'm fairly new with the AI part of it all. I'm just using full-stack dev skills and some well written prompts.
Interesting points! I'm always curious, though – beyond the theoretical benefits, has anyone here actually found a super specific, almost niche use case where fine-tuning blew a general model out of the water in a way that wasn't just about slight accuracy bumps?
I think fine tuning is one of the things that makes verticalised agents so much better than general ones atm.
If agents aren’t specialised then every time they do anything, they have to figure out what to do and they don’t know what data matters, so often just slap entire web pages into their context. General agents use loads of tokens because of this. Vertical agents often have hard coded steps, know what data matters and already know what APIs they’re going to call. They’re far more efficient so will burn less cash.
This also improves the accuracy and quality.
I don't think this effect is as small as people say, especially when combined with the UX and domain specific workflows that verticalised agents allow for.
Without concrete examples, this reads like an advertisement.
I am personlly very bullish on post-traning and fine-tuning. This artice doesn't do justice to the promise.
There really isn't a good tool-calling model in open source, and I don't think the problem is fine-tuning.
Related: what is the best way to augment the model with new knowledge other than at runtime using RAG?
I thought that fine-tuning is no longer being done in the industry, instead transformer adapters like LoRA are being used? Having 1000 fine-tune models for each customer seems too heavy when one can have instead 1000 transformer adapters and swap them during the inference for each batch.
I mean there are tricks like Q-GaLore that allow training LLaMA-7B on a single 16GB GPU but LoRA still seems to be better for production to me.
[dead]
[flagged]
This is a post by a vendor that sells fine-tuning tools.
Here's a suggestion: show me a demo!
For the last two years I've been desperately keen to see just one good interactive demo that lets me see a fine-tuned model clearly performing better (faster, cheaper, more accurate results) than the base model on a task that it has been fine-tuned for - combined with extremely detailed information on how it was fine-tuned - all of the training data that was used.
If you want to stand out among all of the companies selling fine-tuning services yet another "here's tasks that can benefit from fine-tuning" post is not the way to do it. Build a compelling demo!