DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language Modelmeanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting reducing KV cache by 93.3% KV Cache for Generation (KB/Token) 0 10000 20000 30000 40000 50000 DeepSeek-V2 DeepSeek 67B 576% of maximum throughput Maximum Generation Throughput (Tokens/Sec) . . . 31 E Discussion About Pre-Training Data Debiasing 32 F Additional Evaluations on Math and Code 33 G Evaluation Formats 34 3 1. Introduction In the past few years, Large Language Models (LLMs)0 码力 | 52 页 | 1.23 MB | 1 年前3
Trends Artificial Intelligence
(5 days to secure 1MM users). Generative AI = AI that can create content – text, images, audio, or code – based on learned patterns. Source: OpenAI Generative AI – Public Launch of ChatGPT 2022* Knowledge Computing-Related* Patents Granted Annually – 1960-2024, per USPTO *Uses Cooperative Patent Classification (CPC) code G06, which corresponds to computing, calculating or counting patents. Google patents data changes above show average accuracy of top-performing AI models in each calendar year. Source: Papers With Code via Nestor Maslej et al., ‘The AI Index 2025 Annual Report,’ AI Index Steering Committee, Stanford0 码力 | 340 页 | 12.14 MB | 5 月前3
Google 《Prompt Engineering v7》Prompt Engineering 40 Code prompting 42 Prompts for writing code 42 Prompts for explaining code 44 Prompts for translating code 46 Prompts for debugging and reviewing code 48 What about multimodal understanding and generation tasks such as text summarization, information extraction, question and answering, text classification, language or code translation, code generation, and code documentation ‘providing an additional task to the system’. For example, you could use a system prompt to generate a code snippet that is compatible with a specific programming language, or you could use a system prompt0 码力 | 68 页 | 6.50 MB | 6 月前3
OctoML OSS 2019 11 8other languages QQ octoML HTVM Overview *。 Plug directly into TVYM as a backend *,Target C to emit code for microcontrollers that is device- agnostic AuroTYM QQ octoML AutoTVM on HTVM DTYM Runtime send VM runtime VM serialization Dynamic Shape Support Dynamic Shape Allocation o Dynamic Shape Code generation ee Looking for more contributions in this part of the systeml e Haichen and | will discuss0 码力 | 16 页 | 1.77 MB | 6 月前3
Facebook -- TVM AWS Meetup Talkprune models to 80%+ sparsity(with retraining) - Massive speedups combined with specialized code-generation techniques (TVM, Xbyak, etc) - Interesting new tradeoffs - how const are parameters? -0 码力 | 11 页 | 3.08 MB | 6 月前3
XDNN TVM - Nov 2019Subgraphs Post-Processing Pre-Processing CPU FPGA CPU CPU FPGA© Copyright 2018 Xilinx TVM Code Generation >> 9 Subgraph 1 Parallel Subgraphs Post-Processing Pre-Processing CPU FPGA CPU CPU FPGA0 码力 | 16 页 | 3.35 MB | 6 月前3
TVM Meetup: QuantizationCuda, C, … Framework Parsers Graph level optimizations Tensor-level optimizations Machine code generation© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Quantization Appraoches0 码力 | 19 页 | 489.50 KB | 6 月前3
OpenAI 《A practical guide to building agents》whether that's resolving a customer service issue, booking a restaurant reservation, committing a code change, or generating a report. Applications that integrate LLMs but don’t use them to control Explicit guidelines and guardrails defining how the agent behaves Here’s what this looks like in code when using OpenAI’s Agents SDK. You can also implement the same concepts using your preferred library learning of specialized domain-specific languages. In contrast, the Agents SDK adopts a more flexible, code-first approach. Developers can directly express workflow logic using familiar programming constructs0 码力 | 34 页 | 7.00 MB | 6 月前3
普通人学AI指南统,以便你可以使用它创建和运行 Docker 容器。 然后再运行一条命令就可以了: docker run -d --name lobe-chat -p 10084:3210 -e ACCESS_CODE=lobe66 lobehub/lobe-chat:latest 22 解释下这条命令,它用于以守护进程模式(后台)运行一个名为 lobe-chat 的 Docker 容器,并设置一些特定参数: 容 器 的 3210 端 口。 这 样, 主 机 的 10084 端 口 的 请 求 会 被 转 发 到 容 器 的 3210 端 口。 -e ACCESS_CODE=lobe66 : 设 置 环 境 变 量 ACCESS_CODE 的 值 为 lobe66 , 这 通 常 是 用 于 在 容 器 内 配 置 应 用 程 序 的 参 数。 lobehub/lobe-chat:latest :0 码力 | 42 页 | 8.39 MB | 8 月前3
TVM: Where Are We GoingTensorizationVTA: Open & Flexible Deep Learning Accelerator • Runtime JIT compile accelerator micro code • Support heterogenous devices, 10x better than CPU on the same board. • Move hardware complexity Incubated as Apache TVM recently. Independent governance, allowing competitors to collaborate. Open Code Open Development Open GovernanceAcknowledgement Apache (incubating) TVM community Our awesome0 码力 | 31 页 | 22.64 MB | 6 月前3
共 15 条
- 1
- 2













