Element Plus - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

XDNN TVM - Nov 2019

51% 52% 0% 20% 40% 60% 80% 100% VGG16 ResNet-50 GoogleNet-V3 Aristotle on 7020 FPGA Iphone8plus Kirin 970 CPU MEM CONTROLLER BUS Data Mover IMG WR SCHEDULER WEIGHTS WR SCHEDULER SMART MEM DECODER REG MAP WB WR SCHEDULER CTRL SIGNALS MISC CALC AVG POOL MAX POOL ROI POOL ELEMENT WISE ... Efficiency > 50% for mainstream neural networks >> 4© Copyright 2018 Xilinx Inference

0 码力 | 16 页 | 3.35 MB | 6 月前
3
Trends Artificial Intelligence

Outline10 Technology Compounding = Numbers Behind The Momentum11 Technology Compounding Over Thousand-Plus Years = Better + Faster + Cheaper → More… Note: Chart expressed in trillions of real GDP as measured Synthetic Fertilizer Transistors PCs Internet Smartphones Cloud12 …Technology Compounding Over Fifty-Plus Years = Better + Faster + Cheaper → More Note: PC units as of 2000. Desktop internet users as of companies* with market capitalizations in excess of $1 trillion – most with gross margins of +50% plus free cash flow – attacking the same opportunity at the same time in a relatively transparent world

0 码力 | 340 页 | 12.14 MB | 5 月前
3
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

groups, but can achieve stronger performance than MHA. 8 Attention Mechanism KV Cache per Token (# Element) Capability Multi-Head Attention (MHA) 2?ℎ?ℎ? Strong Grouped-Query Attention (GQA) 2???ℎ? Moderate quantiza- tion (Hooper et al., 2024; Zhao et al., 2023) for DeepSeek-V2 to further compress each element in its KV cache into 6 bits on average. Benefiting from MLA and these optimizations, actually deployed Params - 2.5B 2.4B 25.0B 21.5B # Total Params - 15.8B 15.7B 250.8B 247.4B KV Cache per Token (# Element) - 110.6K 15.6K 860.2K 34.6K BBH (EM) 3-shot 37.9 39.0 46.6 50.7 MMLU (Acc.) 5-shot 48.7 50.0 57

0 码力 | 52 页 | 1.23 MB | 1 年前
3
TVM@Alibaba AI Labs

Splits the workload into thread blocks (work groups) and individual threads (work items) Processing Element batch 二 (workitem) 2 下罗汪|

0 码力 | 12 页 | 1.94 MB | 6 月前
3

共 4 条前往

页

XDNN TVM Nov 2019 Trends Artificial Intelligence DeepSeek V2 Strong Economical and Efficient Mixture of Experts Language Model Alibaba AI Labs

分类

语言

格式

XDNN TVM - Nov 2019

Trends Artificial Intelligence

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

TVM@Alibaba AI Labs