DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language ModelDeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model DeepSeek-AI research@deepseek.com Abstract We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs0 码力 | 52 页 | 1.23 MB | 1 年前3
Trends Artificial Intelligence
disclosure. Source: Aggregated by BOND from OpenAI, Microsoft, Google, Anthropic, Meta, Apple, Alibaba, Deepseek, UK Government, US Department of Homeland Security. China data may be subject to informational open-source Qwen 2.5 models, with performance in line with Western competitors 1/25: DeepSeek releases its R1 & R1- Zero open- source reasoning models 2/25: OpenAI releases GPT-4.5 of ChatGPT (2022). In effect, the global competition for AI kicked in with the launch of China’s DeepSeek (1/25) and Jack Ma’s attendance at Chinese President Xi Jinping’s symposium of Chinese business0 码力 | 340 页 | 12.14 MB | 5 月前3
共 2 条
- 1













