New DeepSeek-v2.5 model tops OSS coding leaderboards

  • DeepSeek just released this week their new DeepSeek-V2.5 model, which is a "combination" of their DeepSeek-V2-0628 and DeepSeek-Coder-V2-0724 models according to their release tweet. [0]

    They claim to surpass GPT-4-Turbo, Claude 3 Opus, and the previous DeepSeek-Coder-V2 model in coding, scripting, and math tasks on their official website [1] and it's fully open sourced [2] with a 128k context window.

    It still doesn't show on the LMSYS Chatbot Arena coding leaderboard, which is common with new models, but livebench [3] has it ranked 7th for their coding benchmark, which is the highest ranking for any open source model (not counting mistral-large-2407 as fully open sourced since weights not public), beating meta-llama-3.1-405b-instruct-turbo.

    Since this model's strength is coding, I've also made it available for free to anyone who wants to try it as a coding copilot in VS Code [4] (Disclaimer: I'm a co-founder at Double and this is my extension).

    Hope others find this as exciting as I do, it's great to see open source models continue to improve!

    [0] - https://x.com/deepseek_ai/status/1832026579180163260

    [1] - https://www.deepseek.com/

    [2] - https://huggingface.co/deepseek-ai/DeepSeek-V2.5

    [3] - https://livebench.ai/

    [4] - https://double.bot/