XDNN TVM - Nov 201951% 52% 0% 20% 40% 60% 80% 100% VGG16 ResNet-50 GoogleNet-V3 Aristotle on 7020 FPGA Iphone8plus Kirin 970 CPU MEM CONTROLLER BUS Data Mover IMG WR SCHEDULER WEIGHTS WR SCHEDULER SMART MEM DECODER REG MAP WB WR SCHEDULER CTRL SIGNALS MISC CALC AVG POOL MAX POOL ROI POOL ELEMENT WISE ... Efficiency > 50% for mainstream neural networks >> 4© Copyright 2018 Xilinx Inference0 码力 | 16 页 | 3.35 MB | 6 月前3
Trends Artificial Intelligence
Outline10 Technology Compounding = Numbers Behind The Momentum11 Technology Compounding Over Thousand-Plus Years = Better + Faster + Cheaper → More… Note: Chart expressed in trillions of real GDP as measured Synthetic Fertilizer Transistors PCs Internet Smartphones Cloud12 …Technology Compounding Over Fifty-Plus Years = Better + Faster + Cheaper → More Note: PC units as of 2000. Desktop internet users as of companies* with market capitalizations in excess of $1 trillion – most with gross margins of +50% plus free cash flow – attacking the same opportunity at the same time in a relatively transparent world0 码力 | 340 页 | 12.14 MB | 5 月前3
DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language Modelgroups, but can achieve stronger performance than MHA. 8 Attention Mechanism KV Cache per Token (# Element) Capability Multi-Head Attention (MHA) 2?ℎ?ℎ? Strong Grouped-Query Attention (GQA) 2???ℎ? Moderate quantiza- tion (Hooper et al., 2024; Zhao et al., 2023) for DeepSeek-V2 to further compress each element in its KV cache into 6 bits on average. Benefiting from MLA and these optimizations, actually deployed Params - 2.5B 2.4B 25.0B 21.5B # Total Params - 15.8B 15.7B 250.8B 247.4B KV Cache per Token (# Element) - 110.6K 15.6K 860.2K 34.6K BBH (EM) 3-shot 37.9 39.0 46.6 50.7 MMLU (Acc.) 5-shot 48.7 50.0 570 码力 | 52 页 | 1.23 MB | 1 年前3
TVM@Alibaba AI LabsSplits the workload into thread blocks (work groups) and individual threads (work items) Processing Element batch 二 (workitem) 2 下 罗汪|0 码力 | 12 页 | 1.94 MB | 6 月前3
共 4 条
- 1













