DeepSeek R1 Theory Overview (GRPO and RL and SFT)