TVM@AliOSTVM@AliOS ## PRESENTATION AGENDA ☑ TVM @ AliOS Overview TVM @ AliOS ARM CPU TVM @ AliOS Hexagon DSP TVM @ AliOS Intel GPU ☑ Misc ## PART ONE TVM @ AliOS Overview ## AliOS Overview • AliOS (www.alios AliOS互联网汽车 共创智能网联汽车 共建未来出行生态 ## TVM Timeline @ AliOS  AliOS | 驱动万物智能 ## AliOS TVM Arch  • Optimize on INT8 & FP32 ## AliOS TVM @ ARM CPU INT8 Convolution • NHWC0 码力 | 27 页 | 4.86 MB | 1 年前3
TVM Meetup: Quantization## Compilation of Quantized Models in TVM Animesh Jain Amazon SageMaker Neo AWS AI ## Quantization Overview • Represent FP32 numbers with a lower-precision INT8 numbers • Integer number stands as com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf ## Quantization in TVM ## • Quantization within TVM - Automatic Quantization • TVM stack ingests a FP32 graph and a small dataset • Finds suitable quantization quantization scale • Produces a quantized graph ## • Compiling Pre-quantized models – QNN Dialect • TVM ingests a pre-quantized graph in TFLite or MxNet • Use high-level wrapper ops of QNN dialect  ## Why choosing TVM for our deployment? 1. OpenVino a black box, can not deploy our network(with depthwise conv2d,) 2. TVM can not only deploy our network, but also get get a good performance gain by autotuning 3. TVM can support many kinds of hardware platform: Intel/arm CPU, Nvidia/arm GPU, VTA... ## TVM on windows : 1. Get a .log file from the autotvm on Ubuntu 2.0 码力 | 6 页 | 1.96 MB | 1 年前3
TVM: Where Are We Going## TVM: Where are we going Tianqi Chen  ## Current Deep Learning Landscape Frameworks and Inference engines ![Image] 05c11d51f4c9239e8e6a0/p2_17.jpg) Open source, automated end-to-end optimization framework for deep learning. ## TVM Stack  /p5_13.jpg)  ## TVM: Learning-based Learning System Frameworks   ## I nference Flow https://github.com/xilinx ## TVM as Unified ML Front End Caffe  class AccelModule: XIR Compiler Quantizer Partitioner ## TVM Partitioning - More than supported/not supported, pattern matching graph colorization - Choices how0 码力 | 16 页 | 3.35 MB | 1 年前3
TVM@Alibaba AI Labs阿里巴巴人工智能实验室 AI Labs & TVM PART 1 : ARM32 CPU PART 2 : HIFI4 DSP PART 3 : PowerVR GPU ARM 32 CPU ## Resolution Overflow-aware Quantization Tensorize Kernel + ALIOS TVM ARM32 ARM32 ARM32 $$ .jpg) PowerVR GPU ## PowerVR support by TVM DL Model Caffe2 K mxnet CUDA TOPI Mali TOPI ROCM TOPI PVR TOPI NNVM Frontends Tuning tasks Auto TVM Machine Learning Automated Optimizer Schedule Computation Graph Optimizations Tensor Operators & Property Registry Compiler Toolchain TVM Runtime TOPI Operators  Alibaba Cloud Intelligence ## Outline • TensorCore AutoCodeGen in TVM • FP16 Mixed-Precision Training on PAI • INT8 Inference on PAI-Blade ## TensorCore ## AutoCodeGen stride, nvcuda::wmma::mem\_col\_major)| ## Background • TVM TensorCore Intrinsics • Authored by @Hzfengsy • Intrinsics: tvm_load_matrix_sync, tvm_mma_sync ... • New Memory Scopes: wmma.matrix_a/b, accumulator Virtual threads for data reuse (on going) ## Performance on V100 (FP16) |M, N, K|cuBLAS TensorCore|TVM TensorCore|speedup| |---|---|---|---| |512, 16, 512|7.7470us|5.2570us|1.47X| |512, 32, 512|8.0140us|60 码力 | 26 页 | 5.82 MB | 1 年前3
Facebook -- TVM AWS Meetup Talk## TVM at Facebook Lots of contributors at FB and elsewhere ## Why TVM? - Performance matters a lot - Heterogenous computing environment - High variety of workloads - Ever-increasing set of primitives primitives (over 500 aten kernels) - Interpreter methods not delivering generalized performance ## TVM for Speech Synthesis - WaveRNN-style model architecture - Autoregressive sampling net running at faster - Uh oh  ## Enter, TVM and model co-design - PyTorch operator overhead makes interpreter infeasible - Reduce FLOPs with0 码力 | 11 页 | 3.08 MB | 1 年前3
共 1000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 100
相关搜索词













