OpenAI: support for Reinforcement Fine-tuning available to verified orgs

  • my question for anyone who knows:

    Between SFT, DPO, and RFT, - when to use which? - can we mix and match? e.g, first SFT, then DPO.