烙The world of open-source AI models has a new leader with serious potential ambitions – DeepSeek-V3
Artificialanalysis AI writes that Independent tests show something amazing: DeepSeek – V3 model from Chinese company DeepSeek V3 outperforms all open source models released to date in key metrics.
The model uses an optimized DeepSeekMoE architecture and Multi-head Latent Attention (MLA), which can improve the efficiency of both training and inference. A load balancing strategy was applied for the first time. auxiliary-loss-free, minimizing performance degradation with an even distribution of the workload between experts.
The model is built on the Mixture of Experts architecture (MoE) with a total of 671 billion parameters, of which 37 billion are actively used in the process. This figure is approximately 2.8 times the volume of parameters of its predecessor, DeepSeek V2.5.
➕
DeepSeek V3 received the highest score in the Artificial Analysis Quality Index – 80 points, beating models such as Llama 3.3 70B and Qwen2.5 72B from Alibaba.
The model is comparable to Anthropic’s Claude 3.5 Sonnet (October) is only slightly behind the leaders, Google Gemini 2.0 Flash and OpenAI o1. Of particular note is its ability in coding and mathematical analysis, where it scored 92% in HumanEval and 85% in MATH-500.
DeepSeek API – V3 demonstrates a withdrawal speed of 89 tokens per second, which is four times faster than the previous version (V2.5) at 18 tokens per second. This was made possible by optimizing the withdrawal processes on the H800 cluster infrastructure.
勞 And now the most interesting:
DeepSeek – V3 was trained on a dataset of 14.8 trillion tokens using 2.788 million hours of work NVIDIA H800 GPU. Estimated cost of training was $5.6 million , based on GPU rental at $2 per hour. Training lasted only 57 days on a cluster of 2048 GPUs.
Arena should update the leaderboard soon. Previously, Deepseek-v2.5-1210 occupied an honorable 8th place in the overall ranking.
爛The Chinese write, what DeepSeek – V3 is a powerful step forward in terms of developing open models of artificial intelligence , strengthening China’s position in the face of competition with the USA.
⬇️ https://huggingface.co/deepseek-ai/DeepSeek-V3-Base
⬇️ https://huggingface.co/deepseek-ai/DeepSeek-V3
⬇️ https://github.com/deepseek-ai/DeepSeek-V3
https://github.com/deepseek-ai/DeepSeek-V3
