Figured now was a good time to post this since we recently got surprisingly good results on training an email research agent. Link is above, but will put it here as well since I think it's a good example of RL's promise: https://openpipe.ai/blog/art-e-mail-agent
Thanks for sharing this! A couple of questions come to mind:
- How does training with RL differ from fine tuning?
- When would it make sense to fine tune instead of using RL?
I really like this concept.
Do you have documentation for the API response from the `/_train_model` endpoint?
the table with comparable models is a really great way to show off things here
Was the name influenced by the ship in the murderbot diaries?
Perfect, I've always wanted an easier way to mess with RL frameworks. Gonna mess around with this asap.
Contributor here, we developed the Agent Reinforcement Trainer (ART) library to make it easy to train LLMs for anything.
No callbacks or straitjacket flows. Instead we serve an OpenAI API-compatible endpoint that you can use as a drop-in replacement for any proprietary APIs you may be hitting.
After collecting responses from the inference API, you can tune the model with your own custom rewards and repeat the process as long as you like, until performance converges. We believe this level of flexibility will make it easier for you to train state-of-the-art models for your own use cases, much like Kyle's new email agent[1].
Also happy to answer any questions you have about the framework.
[1] https://openpipe.ai/blog/art-e-mail-agent