TVM@Alibaba AI LabsLabs 阿里巴巴人工智能实验室 AiILabs & TVM PART 1 : ARM32 CPU CONTENT PART 2 : HIFI4 DSP PART 3 : _ PowervVR GPU [和| Alibaba AL.Labs 阿里巴巴人工智能实验室 ARM 32 CPU Resolution Quantization Orize Kernel ALIOS ent pl 1=int8 int8 * int8 int32 = int16 1 + int16 x int8 Alibaba Al.Labs 阿里巴巴人工智能实验室 CPU : MTK8167S (ARM32 A35 1.5GHz) Model : MobileNetV2_ 1.0_ 224 400 336 350 3丈 300 2500 码力 | 12 页 | 1.94 MB | 5 月前3
XDNN TVM - Nov 201920% 40% 60% 80% 100% VGG16 ResNet-50 GoogleNet-V3 Aristotle on 7020 FPGA Iphone8plus Kirin 970 CPU MEM CONTROLLER BUS Data Mover IMG WR SCHEDULER WEIGHTS WR SCHEDULER SMART MEM FABRIC IMG RD Efficiency > 50% for mainstream neural networks >> 4© Copyright 2018 Xilinx Inference Flow >> 5 MxNet CPU Layers FPGA Layers Runtime Image Model Weights Calibration Set Quantizer Compiler Tensor Graph TVM Partitioning >> 7 Subgraph 1 Parallel Subgraphs Post-Processing Pre-Processing FPGA or CPU FPGA CPU CPU FPGA - More than supported/not supported, pattern matching graph colorization - Choices how0 码力 | 16 页 | 3.35 MB | 5 月前3
TVM@AliOSTVMQ@Alios AIOS ! 驱动万物智能 PRESENTATION AGENDA 人 人 e 人 e@ TVM Q@ AliOs Overview TVM @ AliOs ARM CPU TVM @ AliOos Hexagon DSP TVM @ Alios Intel GPU Misc /NiiOS ! 驱动万物智能 PART ONE TVM Q@ AliOs Overview Multimodal Interection CPU (ARM、Intel) 1驱动万物智能 Accelerated Op Library / Others Inference Engine DSP (Qualcomm) PART TWO Alios TVM @ ARM CPU AiOS 1驱动万物智能 Alios TVMQOARM CPU 。 Support TFLite ( Open Open Source and Upstream Master ) 。, Optimize on INT8 & FP32 AiiOS ! 驱动万物智能 Alios TVM @ ARM CPU INT8 * Cache 芍四 Data FO Data FOData … QNNPACK Convolution 。,NHWC layout Cach, 浆百0 码力 | 27 页 | 4.86 MB | 5 月前3
Bring Your Own Codegen to TVMbuild_extern(mod, “dnnl”) 4. Run the inference exe = relay.create_executor(“vm”, mod=mod, ctx=tvm.cpu(0)) data = np.random.uniform(size=(1, 3, 224, 224)).astype(“float32”) out = exe.evaluate()(data, **params) Relay Runtime (VM, Graph Runtime, Interpreter) Your Dispatcher Target Device General Devices (CPU/GPU/FPGA) Mark supported operators or subgraphs 1. Implement an operator-level annotator, OR 2. Implement Relay Runtime (VM, Graph Runtime, Interpreter) Your Dispatcher Target Device General Devices (CPU/GPU/FPGA) Mark supported operators or subgraphs 1. Implement extern operator functions, OR 2. Implement0 码力 | 19 页 | 504.69 KB | 5 月前3
Julia 1.11.4Memory-mapped I/O 1615 83 Network Options 1618 84 Pkg 1622 85 Printf 1626 86 Profiling 1629 86.1 CPU Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1629 86.2 Via multi-threading provides the ability to schedule Tasks simultaneously on more than one thread or CPU core, sharing memory. This is usually the easiest way to get parallelism on one's PC or on a single as part of the standard library shipped with Julia. Most modern computers possess more than one CPU, and several computers can be combined together in a cluster. Harnessing the power of these multiple0 码力 | 2007 页 | 6.73 MB | 3 月前3
Julia 1.11.5 DocumentationMemory-mapped I/O 1615 83 Network Options 1618 84 Pkg 1622 85 Printf 1626 86 Profiling 1629 86.1 CPU Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1629 86.2 Via multi-threading provides the ability to schedule Tasks simultaneously on more than one thread or CPU core, sharing memory. This is usually the easiest way to get parallelism on one's PC or on a single as part of the standard library shipped with Julia. Most modern computers possess more than one CPU, and several computers can be combined together in a cluster. Harnessing the power of these multiple0 码力 | 2007 页 | 6.73 MB | 3 月前3
Julia 1.11.6 Release NotesMemory-mapped I/O 1615 83 Network Options 1618 84 Pkg 1622 85 Printf 1626 86 Profiling 1629 86.1 CPU Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1629 86.2 Via multi-threading provides the ability to schedule Tasks simultaneously on more than one thread or CPU core, sharing memory. This is usually the easiest way to get parallelism on one's PC or on a single as part of the standard library shipped with Julia. Most modern computers possess more than one CPU, and several computers can be combined together in a cluster. Harnessing the power of these multiple0 码力 | 2007 页 | 6.73 MB | 3 月前3
julia 1.13.0 DEVMemory-mapped I/O 1679 85 Network Options 1682 86 Pkg 1686 87 Printf 1690 88 Profiling 1693 88.1 CPU Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1693 88.2 Via multi-threading provides the ability to schedule Tasks simultaneously on more than one thread or CPU core, sharing memory. This is usually the easiest way to get parallelism on one's PC or on a single as part of the standard library shipped with Julia. Most modern computers possess more than one CPU, and several computers can be combined together in a cluster. Harnessing the power of these multiple0 码力 | 2058 页 | 7.45 MB | 3 月前3
Julia 1.12.0 RC1Memory-mapped I/O 1677 85 Network Options 1680 86 Pkg 1684 87 Printf 1688 88 Profiling 1691 88.1 CPU Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1691 88.2 Via multi-threading provides the ability to schedule Tasks simultaneously on more than one thread or CPU core, sharing memory. This is usually the easiest way to get parallelism on one's PC or on a single as part of the standard library shipped with Julia. Most modern computers possess more than one CPU, and several computers can be combined together in a cluster. Harnessing the power of these multiple0 码力 | 2057 页 | 7.44 MB | 3 月前3
Julia 1.12.0 Beta4Memory-mapped I/O 1676 85 Network Options 1679 86 Pkg 1683 87 Printf 1687 88 Profiling 1690 88.1 CPU Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1690 88.2 Via multi-threading provides the ability to schedule Tasks simultaneously on more than one thread or CPU core, sharing memory. This is usually the easiest way to get parallelism on one's PC or on a single as part of the standard library shipped with Julia. Most modern computers possess more than one CPU, and several computers can be combined together in a cluster. Harnessing the power of these multiple0 码力 | 2057 页 | 7.44 MB | 3 月前3
共 24 条
- 1
- 2
- 3













