DeepSeek-V4 - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence DeepSeek-AI research@deepseek.com Abstract We present a preview version of DeepSeek-V4 series, including two strong Mixture-of- DeepSeek-V4-Flash with 284B parameters (13B activated) both supporting a context length of one million tokens. DeepSeek-V4 series incorporate several key upgrades in architecture and optimization: (1) a hybrid attention the state-of-the-art for open models, outperforming its predecessors in core tasks. Meanwhile, DeepSeek-V4 series are highly efficient in long-context scenarios. In the one-million-token context setting

0 码力 | 58 页 | 4.27 MB | 3 月前
3

共 1 条前往

页

DeepSeek-V4 Compressed Sparse Attention (CSA)Heavily Compressed Attention (HCA)hybrid attention Mixture-of-Experts (MoE)