efficient models & layers - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model DeepSeek-AI research@deepseek.com Abstract We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through

0 码力 | 52 页 | 1.23 MB | 1 年前
3
Trends Artificial Intelligence

digital datasets that have been in the making for over three decades; breakthrough large language models (LLMs) that – in effect – found freedom with the November 2022 launch of OpenAI’s ChatGPT with computers are ingesting massive datasets to get smarter and more competitive. Breakthroughs in large models, cost-per-token declines, open-source proliferation and chip performance improvements are making are racing to build and deploy the next layers of AI infrastructure: agentic interfaces, enterprise copilots, real-world autonomous systems, and sovereign models. Rapid advances in artificial intelligence

0 码力 | 340 页 | 12.14 MB | 5 月前
3
XDNN TVM - Nov 2019

Xilinx Elliott Delaye FPGA CNN Accelerator and TVM© Copyright 2018 Xilinx TVM Target devices and models >> 2 HW Platforms ZCU102 ZCU104 Ultra96 PYNQ Face detection Pose estimation Video Video analytics Lane detection Object detection Segmentation Models© Copyright 2018 Xilinx Xilinx Cloud DPU Processor (xDNNv3) >> 3 ˃ Configurable Overlay Processor for mainstream neural networks >> 4© Copyright 2018 Xilinx Inference Flow >> 5 MxNet CPU Layers FPGA Layers Runtime Image Model Weights Calibration Set Quantizer Compiler Tensor Graph Optimization

0 码力 | 16 页 | 3.35 MB | 6 月前
3
OctoML OSS 2019 11 8

remumn dming data AutoTYM 二 QQ octoML Coming Soon to HTVM (Self-Hosted Models) Host Device mized RE -一一 QQ octoML Transformer Improvements Transformer based models such as BERT have recently become very Popular and require first class support in TVML. ee What relay ONNX frontend to support all opset versions of BERT. 里This enables importing of native ONNX models and those converted from Tensorflow. 5 ， Improve scheduling of batch matrix multiplies. 时”Early autotuning

0 码力 | 16 页 | 1.77 MB | 6 月前
3
Facebook -- TVM AWS Meetup Talk

Autoregressive sampling net running at faster than real-time - Compute split between GRU units and FC layers - 24kHz sampling frequency requires 40us sampling net runtime - First PyTorch model used a 3,400us Unstructured Sparsity - Lots of 'free' wins from exploring sparsity in modern ML models - Can often prune models to 80%+ sparsity(with retraining) - Massive speedups combined with specialized code-generation

0 码力 | 11 页 | 3.08 MB | 6 月前
3
OpenAI - AI in the Enterprise

evals 6 Embed AI into your products 9 Start now and invest early 11 Customize and fine-tune your models 13 Get AI in the hands of experts 16 Unblock your developers 18 Set bold automation goals 21 research and deployment company, OpenAI prioritizes partnering with global companies because our models will increasingly do their best work with sophisticated, complex, interconnected workflows and systems teams. Our Research Team advances the foundations of AI, developing new models and capabilities. Our Applied Team turns those models into products, like ChatGPT Enterprise and our API. And our Deployment

0 码力 | 25 页 | 9.48 MB | 6 月前
3
Google 《Prompt Engineering v7》

might need to be optimized for your specific model, regardless of whether you use Gemini language models in Vertex AI, GPT, Claude, or an open source model like Gemma or LLaMA. Besides the prompt, you words? This is also known as the "repetition loop bug", which is a common issue in Large Language Models where the model gets stuck in a cycle, repeatedly generating the same (filler) word, phrase, or “few-shot” prompting. General prompting / zero shot One-shot & few-shot When creating prompts for AI models, it is helpful to provide examples. These examples can help the model understand what you are asking

0 码力 | 68 页 | 6.50 MB | 7 月前
3
TVM@Alibaba AI Labs

GPU Alibaba Al.Labs 阿里巴巴人工智能实验室 PowerVR support by TVM NNVM Compiler -Execution graph -Model layers functions Computation Graph Optimizations -Param TvM Tensor Operators &

0 码力 | 12 页 | 1.94 MB | 6 月前
3
OpenAI 《A practical guide to building agents》

foundations 7 Guardrails 24 Conclusion 32 2 Practical guide to building agents Introduction Large language models are becoming increasingly capable of handling complex, multi-step tasks. Advances in reasoning, to users about the weather.", 7 A practical guide to building agents Selecting your models Different models have different strengths and tradeoffs related to task complexity, latency, and cost. As As we’ll see in the next section on Orchestration, you might want to consider using a variety   of models for different tasks in the workflow. Not every task requires the smartest model—a simple retrieval

0 码力 | 34 页 | 7.00 MB | 6 月前
3
TVM Meetup: Quantization

its Affiliates. All rights reserved. Animesh Jain Amazon SageMaker Neo Compilation of Quantized Models in TVM AWS AI© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Quantization dataset • Finds suitable quantization scale • Produces a quantized graph • Compiling Pre-quantized models – QNN Dialect • TVM ingests a pre-quantized graph in TFLite or MxNet • Use high-level wrapper ops Frontend Parsers • TFLite Pre-quantized Models • In good shape • Supports all Image Classification PreQuantized hosted models • MXNet Pre-quantized Models • Tested internally with MxNet + MKLDNN

0 码力 | 19 页 | 489.50 KB | 6 月前
3

共 17 条前往

页

分类

语言

格式

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Trends Artificial Intelligence

XDNN TVM - Nov 2019

OctoML OSS 2019 11 8

Facebook -- TVM AWS Meetup Talk

OpenAI - AI in the Enterprise

Google 《Prompt Engineering v7》

TVM@Alibaba AI Labs

OpenAI 《A practical guide to building agents》

TVM Meetup: Quantization