DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language Modelguarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation Multi-head Latent Attention (MLA). Through jointly compressing the keys and values into a latent vector, MLA significantly reduces the KV cache during inference. Then, q?, k?, v? will be sliced into (9) k? ? = ???c?? ? , (10) v? ? = ???c?? ? , (11) where c?? ? ∈ R?? is the compressed latent vector for keys and values; ??(≪ ?ℎ?ℎ) denotes the KV compression dimension; ? ??? ∈ R??×? is the down-projection0 码力 | 52 页 | 1.23 MB | 1 年前3
Trends Artificial Intelligence
make it easier for others to follow. New front-end frameworks, embedding pipelines, model routers, vector databases, and serving layers are multiplying at an accelerating rate. Each wave of developer GitHub repositories with 500+ stars. Infrastructure = tools for model serving, compute management, vector search & databases. Model development = frameworks for modeling & training, inference optimization circuit (ASIC), a chip designed for a single, specific purpose: running the unique matrix and vector-based mathematics that’s needed for building and running AI models. Our first such chip, TPU v10 码力 | 340 页 | 12.14 MB | 5 月前3
TVM: Where Are We GoingFIFO Explicitly Managed Memory Subsystem TPUsTensorization Challenge Compute primitives scalar vector tensor Challenge: Build systems to support emerging tensor instructionsTensorization Challenge0 码力 | 31 页 | 22.64 MB | 6 月前3
开源中国 2023 大模型(LLM)技术报告年前四个月,向量数据库公司融资额 ,超过了 2022 年的总和 (图源:https://www.cbinsights.com/research/generative-ai-infrastructure- vector-database/) 7 / 32 LLM 基础设施:大模型框架及微调 (Fine Tuning) 大模型框架指专门设计用于构建、训练和部署大型机器 学习模型和深度学习模型的软件框架。这些框架提供了0 码力 | 32 页 | 13.09 MB | 1 年前3
共 4 条
- 1













