DeepSeek-V3: Open-Source 671B Model Rivals GPT-4o

Model Release Overview

On December 26, 2024, DeepSeek Corporation announced in Beijing the official launch of the first version of its new series model, DeepSeek-V3, which was simultaneously open-sourced. Users can interact with the latest V3 model by visiting the official website, chat.deepseek.com. The API service has also been updated accordingly, with no changes required to the interface configuration. It is worth noting that the current version of DeepSeek-V3 does not yet support multimodal input and output.

Technical Specifications and Architecture

DeepSeek-V3 is a self-developed MoE model with 671B parameters, 37B of which are activated, and has been pre-trained on 14.8T tokens. According to the official paper link, the model has achieved outstanding results in multiple evaluations, surpassing other open-source models such as Qwen2.5-72B and Llama-3.1-405B, and performing on par with world-leading closed-source models like GPT-4o and Claude-3.5-Sonnet.

Performance Benchmarks

Knowledge-Based Tasks

In knowledge-based tasks (MMLU, MMLU-Pro, GPQA, SimpleQA), DeepSeek-V3 has shown significant improvement over its predecessor, DeepSeek-V2.5, and is close to the current best-performing model, Claude-3.5-Sonnet-1022.

Long-Text Processing

In long-text evaluations such as DROP, FRAMES, and LongBench v2, DeepSeek-V3 has outperformed other models on average.

Coding Capabilities

In coding scenarios, DeepSeek-V3 far surpasses all existing non-o1 models in algorithmic coding tasks (Codeforces) and approaches Claude-3.5-Sonnet-1022 in engineering coding tasks (SWE-Bench Verified).

Mathematical Performance

In mathematics, DeepSeek-V3 has significantly outperformed all open-source and closed-source models in the American Mathematics Competition (AIME 2024, MATH) and the National High School Mathematics League (CNMO 2024).

Chinese Language Capabilities

In Chinese language capabilities, DeepSeek-V3 performs similarly to Qwen2.5-72B in educational evaluations like C-Eval and pronoun resolution tasks but leads in factual knowledge tasks like C-SimpleQA.

Technical Innovations and Performance Improvements

Through algorithmic and engineering innovations, DeepSeek-V3’s token generation speed has increased dramatically from 20 TPS to 60 TPS, achieving a threefold improvement over the V2.5 model, providing users with a faster and smoother experience.

Deployment and Community Support

DeepSeek-V3 employs FP8 training and has open-sourced its native FP8 weights. Thanks to the support of the open-source community, SGLang and LMDeploy were among the first to support native FP8 inference for the V3 model, while TensorRT-LLM and MindIE have implemented BF16 inference. Additionally, to facilitate community adaptation and expand application scenarios, the company has provided conversion scripts from FP8 to BF16. Model weights and more information on local deployment can be found via the relevant links.

Open-Source Philosophy and Future Outlook

DeepSeek Corporation stated that “pursuing AGI with an open-source spirit and long-termism” has always been its steadfast belief. The company is thrilled to share its阶段性 progress in model pre-training with the community and is delighted to see the gap between open-source and closed-source models narrowing further. This marks a brand-new beginning. In the future, the company will continue to develop richer functionalities such as deep thinking and multimodal capabilities on the DeepSeek-V3 base model and will persistently share its latest exploratory achievements with the community.

Model Release Overview#

Technical Specifications and Architecture#

Performance Benchmarks#

Knowledge-Based Tasks#

Long-Text Processing#

Coding Capabilities#

Mathematical Performance#

Chinese Language Capabilities#

Technical Innovations and Performance Improvements#

Deployment and Community Support#

Open-Source Philosophy and Future Outlook#