DeepSeek - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model DeepSeek-AI research@deepseek.com Abstract We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs

0 码力 | 52 页 | 1.23 MB | 1 年前
3
Trends Artificial Intelligence

disclosure. Source: Aggregated by BOND from OpenAI, Microsoft, Google, Anthropic, Meta, Apple, Alibaba, Deepseek, UK Government, US Department of Homeland Security. China data may be subject to informational open-source Qwen 2.5 models, with performance in line with Western competitors 1/25: DeepSeek releases its R1 & R1- Zero open- source reasoning models 2/25: OpenAI releases GPT-4.5 of ChatGPT (2022). In effect, the global competition for AI kicked in with the launch of China’s DeepSeek (1/25) and Jack Ma’s attendance at Chinese President Xi Jinping’s symposium of Chinese business

0 码力 | 340 页 | 12.14 MB | 5 月前
3

共 2 条前往

页

DeepSeek V2 Strong Economical and Efficient Mixture of Experts Language Model Trends Artificial Intelligence