efficient deep learning - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model DeepSeek-AI research@deepseek.com Abstract We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while

0 码力 | 52 页 | 1.23 MB | 1 年前
3
Trends Artificial Intelligence

Intelligence,’ a term he coined 1/62: Arthur Samuel, an IBM computer scientist, creates a self-learning program that proves capable of defeating a top USA checkers champion AI ‘Winter1’ (1967-1996) Shakey, the first general- purpose mobile robot that can reason about its own actions 5/97: Deep Blue, IBM’s chess- playing computer, defeats Garry Kasparov, the world chess champion Trending = Unprecedented37 Machine-Learning Model* Trending = In 2015... Industry Surpassed Academia as Data + Compute + Financial Needs Rose *Machine Learning = A subset of AI where machines learn

0 码力 | 340 页 | 12.14 MB | 5 月前
3
OpenAI - AI in the Enterprise

step. How it started Morgan Stanley’s first eval focused on making their financial advisors more efficient and effective. The premise was simple: If advisors could access information faster and reduce the people. AI amplifies our potential and helps us be more efficient and creative. Elena Alfaro Head of Global AI Adoption Product Note: With deep research, ChatGPT can do work independently. Give it a prompt employee productivity and gives them access to deep, detailed research on any topic in minutes. In an internal evaluation by experts across domains, deep research saved an average of 4 hours per complex

0 码力 | 25 页 | 9.48 MB | 6 月前
3
01 Structure of Scientific Papers - Introduction to Scientific Writing WS2021/22

data science lifecycle)  2012-2018 IBM Research – Almaden, USA  Declarative large-scale machine learning  Optimizer and runtime of Apache SystemML  2011 PhD TU Dresden, Germany  Cost-based optimization Algebra for Large-Scale Machine Learning. PVLDB 2016] [Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Scaling Machine Learning via Compressed Linear Algebra. SIGMOD Large-Scale Machine Learning. VLDB Journal 2018 27(5)] [Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Compressed Linear Algebra for Large-Scale Machine Learning. Commun.

0 码力 | 36 页 | 1.12 MB | 1 年前
3
2021 中国开源年度报告

and more and more schools to open source courses. We hope the follow-up can be achieved in the learning of computers, compiling principles, software engineering, and other theoretical knowledge at most eye-catching one in China is PingCAP/TiDB, whose open source strategy and tactics are worth learning. 堵俊平：这两年，一个很明显的趋势是越来越多的初创企业参与开源。这一方面得益于 ToB 赛道成为市场和政策导向的热点，另一方面开源所代表的开放式创新也被投资界所认可。尤其是开源与数据（数据库&大数据）以及 communicate, which can be open and transparent, and settle down the discussion process and reduce the learning cost of new entrants. Domestic developers are currently used to discussing issues in WeChat

0 码力 | 199 页 | 9.63 MB | 1 年前
3
Google 《Prompt Engineering v7》

the model uses to predict a specific output. You don’t need to be a data scientist or a machine learning engineer – everyone can write a prompt. However, crafting the most effective prompt can be complicated model’s ability to provide meaningful output. You don’t need to be a data scientist or a machine learning engineer – everyone can write a prompt. Prompt Engineering February 2025 7 When you chat with temperature control can be understood in a similar way to the softmax function used in machine learning. A low temperature setting mirrors a low softmax temperature (T), emphasizing a single, preferred

0 码力 | 68 页 | 6.50 MB | 7 月前
3
2024 中国开源开发者报告

Transactions on Information Theory, 2(3), 61-79. 【3】Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." nature 529.7587 (2016): 484-489. 【4】 Wei, Jason, et al. "Chain-of-thought Processing Systems 36 (2024). 【8】https://huggingface.co/spaces/mteb/leaderboard 【9】https://github.com/deep-floyd/IF 【10】https://developer.nvidia.com/blog/pushing-the-boundaries-of-speech-recognition-with-nemo-parakeet-asr- 在 IntelliJ IDEA 中，我们可以看到 AI 功能的加入，如：原生的向量化模型、基于语义化搜索（SearchEverywhere）、结合补全统计的机器学习补全插件 Machine Learning Code Completion、适用于单个代码行的 Full Line Code Completion 等等。而除了 GitHub Copilot 工具本身，它还开放了其插件能力，使得我们可以定义自己的

0 码力 | 111 页 | 11.44 MB | 9 月前
3
GNU Image Manipulation Program User Manual 2.4

selection channel in all its glorious detail by toggling the QuickMask button. A large component of learning how to use GIMP effectively is acquiring the art of making good selections—selections that contain filters are examples of this: because they are implemented by plug-ins, the GIMP core has no really efficient way of knowing what they have changed, so it has no way to implement Undo except by memorizing the the ‘Lempel-Ziv-Welch’ algorithm, a lossless compression technique. This is an old method, still efficient and fast. More informations at [WKPD-LZW]. • Pack Bits : PackBits is a fast, simple compression

0 码力 | 653 页 | 19.93 MB | 1 年前
3
GNU Image Manipulation Program User Manual 2.10

selection channel in all its glorious detail by toggling the QuickMask button. A large component of learning how to use GIMP effectively is acquiring the art of making good selections—selections that contain consume a lot of undo memory. Most filters are implemented by plug-ins, so the GIMP core has no efficient way of knowing what changed. As such, there is no way to implement Undo except by memorizing the compressed using the “Lempel-Ziv-Welch” algorithm, a lossless compres- sion technique. This is efficient and fast. More information at [WKPD-LZW]. • Pack Bits: is a fast, simple compression scheme for

0 码力 | 1070 页 | 44.54 MB | 1 年前
3
Krita 5.2 Manual

light the more pigments you put together. Because of that, in traditional pigment mixing, our most efficient primaries are three fairly light colors: Cyan blue and Magenta red and Yellow (CMY). A computer additive mixing, where adding more and more colored lights result in white. This is why the three most efficient primaries, as used by computers are Red, Green and Blue (RGB). Per pixel, a computer then stores because the computer only needs to remember how white a color is. This is why grayscale is more efficient memory-wise. In fact, if you look at each channel separately, they also look like grayscale images

0 码力 | 1502 页 | 79.07 MB | 1 年前
3

共 242 条前往

页

分类

语言

格式

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Trends Artificial Intelligence

OpenAI - AI in the Enterprise

01 Structure of Scientific Papers - Introduction to Scientific Writing WS2021/22

2021 中国开源年度报告

Google 《Prompt Engineering v7》

2024 中国开源开发者报告

GNU Image Manipulation Program User Manual 2.4

GNU Image Manipulation Program User Manual 2.10

Krita 5.2 Manual