TVM工具组绝赞招聘中 ## 平头哥 ## TVM CAFFE 前端 2019·11·16 ## TVM 在平头哥 • 工具链产品 平头哥芯片平台发布的配套软件中, TVM 是工具链产品的重要组成部分: 负责将预训练好的 caffe 或者 tensorflow 的模型,转换到 LLVM IR,最后生成可以在无剑 SoC 平台上 执行的二进制。 平头哥集成开发环境 统一应用开发框架 一键 应用部署 Caffe TensorFlow TVM 图形化 算力分析 T-Head NN 无剑SoC平台 LLVM 自定义 AI加速器 异构 联合调试 ## 为何添加 caffe 前端? ## 客户需求 评估阶段:客户用于评估芯片的网络,caffe 模型占很大比重。 ## - 竞品已支持 caffe 前端 当前各大芯片厂商的部署工具大多数都支持,支持 caffe 前端有利于提高竞争力。 前端有利于提高竞争力。 ## - 开源社区 存量的开源 caffe 网络模型众多,TVM 直接支持 caffe 让大家更方便尝试 caffe 资源。 ## 绝赞招聘中 ## 当前进度 ## - 无 caffe 依赖 from_caffe 直接导入 caffe 模型文件,不需要预先安装 caffe。 ## • net 已测试网络:alexnet / densenet121 / inception0 码力 | 6 页 | 326.80 KB | 1 年前3
TVM@AliOSTVM@AliOS ## PRESENTATION AGENDA ☑ TVM @ AliOS Overview TVM @ AliOS ARM CPU TVM @ AliOS Hexagon DSP TVM @ AliOS Intel GPU ☑ Misc ## PART ONE TVM @ AliOS Overview ## AliOS Overview • AliOS (www.alios AliOS互联网汽车 共创智能网联汽车 共建未来出行生态 ## TVM Timeline @ AliOS  AliOS | 驱动万物智能 ## AliOS TVM Arch  • Optimize on INT8 & FP32 ## AliOS TVM @ ARM CPU INT8 Convolution • NHWC0 码力 | 27 页 | 4.86 MB | 1 年前3
TVM Meetup: Quantization## Compilation of Quantized Models in TVM Animesh Jain Amazon SageMaker Neo AWS AI ## Quantization Overview • Represent FP32 numbers with a lower-precision INT8 numbers • Integer number stands as com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf ## Quantization in TVM ## • Quantization within TVM - Automatic Quantization • TVM stack ingests a FP32 graph and a small dataset • Finds suitable quantization quantization scale • Produces a quantized graph ## • Compiling Pre-quantized models – QNN Dialect • TVM ingests a pre-quantized graph in TFLite or MxNet • Use high-level wrapper ops of QNN dialect  ## Why choosing TVM for our deployment? 1. OpenVino a black box, can not deploy our network(with depthwise conv2d,) 2. TVM can not only deploy our network, but also get get a good performance gain by autotuning 3. TVM can support many kinds of hardware platform: Intel/arm CPU, Nvidia/arm GPU, VTA... ## TVM on windows : 1. Get a .log file from the autotvm on Ubuntu 2.0 码力 | 6 页 | 1.96 MB | 1 年前3
TVM: Where Are We Going## TVM: Where are we going Tianqi Chen  ## Current Deep Learning Landscape Frameworks and Inference engines ![Image] 8e6a0/p2_17.jpg) Open source, automated end-to-end optimization framework for deep learning. ## TVM Stack    ## TVM: Learning-based Learning System Frameworks   ## I nference Flow https://github.com/xilinx ## TVM as Unified ML Front End Caffe  class AccelModule: XIR Compiler Quantizer Partitioner ## TVM Partitioning - More than supported/not supported, pattern matching graph colorization - Choices how0 码力 | 16 页 | 3.35 MB | 1 年前3
TVM@Alibaba AI Labs阿里巴巴人工智能实验室 AI Labs & TVM PART 1 : ARM32 CPU PART 2 : HIFI4 DSP PART 3 : PowerVR GPU ARM 32 CPU ## Resolution Overflow-aware Quantization Tensorize Kernel + ALIOS TVM ARM32 ARM32 ARM32 $$ .jpg) PowerVR GPU ## PowerVR support by TVM DL Model Caffe2 K mxnet CUDA TOPI Mali TOPI ROCM TOPI PVR TOPI NNVM Frontends Tuning tasks Auto TVM Machine Learning Automated Optimizer Schedule Computation Graph Optimizations Tensor Operators & Property Registry Compiler Toolchain TVM Runtime TOPI Operators  Alibaba Cloud Intelligence ## Outline • TensorCore AutoCodeGen in TVM • FP16 Mixed-Precision Training on PAI • INT8 Inference on PAI-Blade ## TensorCore ## AutoCodeGen stride, nvcuda::wmma::mem\_col\_major)| ## Background • TVM TensorCore Intrinsics • Authored by @Hzfengsy • Intrinsics: tvm_load_matrix_sync, tvm_mma_sync ... • New Memory Scopes: wmma.matrix_a/b, accumulator Virtual threads for data reuse (on going) ## Performance on V100 (FP16) |M, N, K|cuBLAS TensorCore|TVM TensorCore|speedup| |---|---|---|---| |512, 16, 512|7.7470us|5.2570us|1.47X| |512, 32, 512|8.0140us|60 码力 | 26 页 | 5.82 MB | 1 年前3
Facebook -- TVM AWS Meetup Talk## TVM at Facebook Lots of contributors at FB and elsewhere ## Why TVM? - Performance matters a lot - Heterogenous computing environment - High variety of workloads - Ever-increasing set of primitives primitives (over 500 aten kernels) - Interpreter methods not delivering generalized performance ## TVM for Speech Synthesis - WaveRNN-style model architecture - Autoregressive sampling net running at faster - Uh oh  ## Enter, TVM and model co-design - PyTorch operator overhead makes interpreter infeasible - Reduce FLOPs with0 码力 | 11 页 | 3.08 MB | 1 年前3
共 1000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 100
相关搜索词













