INT8 Inference - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

PAI & TVM Meetup - Shanghai 20191116

Mixed-Precision Training/Inference PAI (Platform of AI) Alibaba Cloud Intelligence ## Outline • TensorCore AutoCodeGen in TVM • FP16 Mixed-Precision Training on PAI • INT8 Inference on PAI-Blade ## TensorCore PAI-TF ![Image](/uploads/documents/e/6/f/1/e6f1347cd2b546040bb0c9ad6650060d/p20_2.jpg) ## I NT8 Inference on PAI-Blade ## PAI-Blade ![Image](/uploads/documents/e/6/f/1/e6f1347cd2b546040bb0c9ad6650060d/p22_2

0 码力 | 26 页 | 5.82 MB | 1 年前
3
2 使用Python训练和部署低精度模型张校捷

(TensorFlow版) 张校捷 2019/9/21 ## 目录 >> 低精度的概念和意义 TensorFlow的FP16模型 >> TensorRT的FP16/Int8模型总结 ![Image](/uploads/documents/a/3/b/b/a3bbe1f6675c3cec959e1f224b976c60/p2_2.jpg) ![Image](/ float32) FP16: E8M7 (TPU, tf.bfloat16) FP16: E5M10 (GPU, tf.float16) Int8 ## 低精度浮点数的优点 ### 1. 节约内存/显存的使用（FP16为原来的1/2，int8为原来的1/4） 2. 特殊的硬件专门用于低精度浮点数的计算加速（TensorCore） FP16 storage/input Full precision 使用低精度的意义 ## TensorCores适用条件 1. 卷积：K（输入通道），C（输出通道） 2. 通用矩阵乘法（GEMM）：MxK，KxN，（M，N，K） FP16: 大小为8x Int8: 大小为16x 如果FP32要使用，可以设置（内部转为FP16）： TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=1 TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=1

0 码力 | 24 页 | 981.45 KB | 2 年前
3
TVM@AliOS

TVM@ARM CPU • Support TFLite (Open Source and Upstream Master) • Optimize on INT8 & FP32 ## AliOS TVM @ ARM CPU INT8 Convolution • NHWC layout • im2col + pack • Tensorize GEMM ![Image](/uplo TVM @ ARM CPU INT8 TVM / QNNPACK Speed Up @ Mobilenet V2 @ rasp 3b+ AARCH64 ![Image](/uploads/documents/9/0/e/a/90eab7a9909eddc3e1f4b253cda18ef6/p10_1.jpg) ## AliOS TVM @ ARM CPU INT8 Depthwise Convolution instruction if your ARM does not have dot 3. compute_at is very important ## AliOS TVM @ ARM CPU INT8 TVM / QNNPACK Speed Up @ Mobilenet V2 @ rasp 3b+ AARCH64 ![Image](/uploads/documents/9/0/e/a/9

0 码力 | 27 页 | 4.86 MB | 1 年前
3
The Julia Language 1.12.0 beta2 Documentation

backtrace 451 35 Performance Tips 454 35.1 Table of contents 454 35.2 General advice 455 35.3 Type inference 459 35.4 Memory management and arrays 474 35.5 Execution latency, package loading and package precompiling Languages Julia features optional typing, multiple dispatch, and good performance, achieved using type inference and just-in-time (JIT) compilation (and optional ahead-of-time compilation), implemented using LLVM function. Existing code then seamlessly applies to the new data types. Partly because of run-time type inference (augmented by optional type annotations), and partly because of a strong focus on performance from

0 码力 | 2048 页 | 7.41 MB | 1 月前
3
The Julia Language 1.12.6 Documentation

Issues 448 36 Performance Tips 450 36.1 Table of contents 450 36.2 General advice 451 36.3 Type inference 455 36.4 Memory management and arrays 470 36.5 Execution latency, package loading and package precompiling multi-threading locks 1776 107.20 Arrays with custom indices 1780 107.21 Module loading 1783 107.22 Inference 1784 107.23 Julia SSA-form IR 1786 107.24 EscapeAnalysis 1791 107.25 Ahead of Time Compilation 1806 Languages Julia features optional typing, multiple dispatch, and good performance, achieved using type inference and just-in-time (JIT) compilation (and optional ahead-of-time compilation), implemented using LLVM

0 码力 | 1897 页 | 7.71 MB | 1 月前
3
Julia v1.4.2 Documentation

Julia Execution ..... 1238 Parsing ..... 1239 Macro Expansion ..... 1239 Type Inference ..... 1239 JIT Code Generation ..... 1240 System Image ..... 1241 106.6 Calling Conventions 1274 106.19Module loading ..... 1275 Experimental features ..... 1275 106.20Inference ..... 1276 How inference works ..... 1276 Debugging compiler.jl ..... 1276 The inlining algorithm (inline_worthy) Julia features optional typing, multiple dispatch, and good performance, achieved using type inference and just-in-time (JIT) compilation, implemented using LLVM. It is multi-paradigm, combining features

0 码力 | 1314 页 | 4.29 MB | 2 年前
3
Julia 1.8.1 Documentation

1473 101.18 Arrays with custom indices ..... 1476 101.19 Module loading ..... 1480 101.20 Inference ..... 1480 101.21 Julia SSA-form IR ..... 1482 101.22 EscapeAnalysis ..... 1486 101.23 Static Julia features optional typing, multiple dispatch, and good performance, achieved using type inference and just-in-time (JIT) compilation, implemented using LLVM. It is multi-paradigm, combining features Existing code then seamlessly applies to the new data types. Partly because of run-time type inference (augmented by optional type annotations), and partly because of a strong focus on performance from

0 码力 | 1563 页 | 5.03 MB | 2 年前
3
Julia 1.9.0 beta2 Documentation

multi-threading locks 1545 101.19 Arrays with custom indices 1549 101.20 Module loading 1552 101.21 Inference 1553 101.22 Julia SSA-form IR 1555 101.23 EscapeAnalysis 1558 101.24 Static analyzer annotations Julia features optional typing, multiple dispatch, and good performance, achieved using type inference and just-in-time (JIT) compilation (and optional ahead-of-time compilation), implemented using LLVM Existing code then seamlessly applies to the new data types. Partly because of run-time type inference (augmented by optional type annotations), and partly because of a strong focus on performance from

0 码力 | 1637 页 | 5.25 MB | 2 年前
3
Julia 1.9.0 rc3 Documentation

1552 102.19 Arrays with custom indices ..... 1556 102.20 Module loading ..... 1559 102.21 Inference ..... 1560 102.22 Julia SSA-form IR ..... 1562 102.23 EscapeAnalysis ..... 1565 102.24 Static Julia features optional typing, multiple dispatch, and good performance, achieved using type inference and just-in-time (JIT) compilation (and optional ahead-of-time compilation), implemented using LLVM Existing code then seamlessly applies to the new data types. Partly because of run-time type inference (augmented by optional type annotations), and partly because of a strong focus on performance from

0 码力 | 1644 页 | 5.26 MB | 2 年前
3
Julia 1.7.0 DEV Documentation

98.5 Eval of Julia code 1296 Julia Execution 1296 Parsing 1297 Macro Expansion 1298 Type Inference 1298 JIT Code Generation 1299 System Image 1299 98.6 Calling Conventions 1299 Julia Native 1332 98.19 Module loading ..... 1333 Experimental features ..... 1334 98.20 Inference ..... 1334 How inference works ..... 1334 Debugging compiler.jl ..... 1334 The inlining algorithm (inline_worthy) Julia features optional typing, multiple dispatch, and good performance, achieved using type inference and just-in-time (JIT) compilation, implemented using LLVM. It is multi-paradigm, combining features

0 码力 | 1399 页 | 4.59 MB | 2 年前
3

共 540 条前往

页

分类

语言

格式

PAI & TVM Meetup - Shanghai 20191116

2 使用Python训练和部署低精度模型张校捷

TVM@AliOS

The Julia Language 1.12.0 beta2 Documentation

The Julia Language 1.12.6 Documentation

Julia v1.4.2 Documentation

Julia 1.8.1 Documentation

Julia 1.9.0 beta2 Documentation

Julia 1.9.0 rc3 Documentation

Julia 1.7.0 DEV Documentation

搜索

分类

语言

格式