Curve for CNCF Main
framework • Use bthread (M bthread map N pthread) for scalability and performance on Multi-thread CPU • Lock free queue design • Memory zero copy design • Cloud native supportCloud native for CurveBS CURVE CHUNK SERVER BLUESTORE META Precreate Chunk File Pool on ext4 RocksDB META OVERHEAD without ext4 meta overhead increase read/write magnification PERFORMANCE High Need to optimize rocksdbCurveFS0 码力 | 21 页 | 4.56 MB | 5 月前3Dynamic Model in TVM
function CPU strategy func GPU strategy func OpStrategy OpStrategy OpStrategy Default implement Specialized implement 1 Specialized implement 2 (e.g., winograd) kernel_size <= 3 b < 8 “cpu” “gpu”© Affiliates. All rights reserved. How to register a strategy? @conv2d_strategy.register("cpu") def conv2d_strategy_cpu(attrs, inputs, out_type, target): strategy = OpStrategy() layout = attrs.data_layout Services, Inc. or its Affiliates. All rights reserved. Why do we need graph dispatcher 1. Minimal overhead: only one dispatching operation is required for each inference. 2. Fit for operator such as conv2d_NCHWc0 码力 | 24 页 | 417.46 KB | 5 月前3Facebook -- TVM AWS Meetup Talk
(baseline), 40us (target) - 85x speedup - Uh ohEnter, TVM and model co-design - PyTorch operator overhead makes interpreter infeasible - Reduce FLOPs with block-sparsified weight matrices - not a new (~10 lines of Relay IR) - A few days of work - TVM sampling model running in 30us on single server CPU core - Beat hand-written, highly optimized baselines (https://github.com/mozilla/LPCNet) by ~40%0 码力 | 11 页 | 3.08 MB | 5 月前3PAI & TVM Meetup - Shanghai 20191116
on warp level schedule Motivation 全各 “The overhead of writing warp-level schedule for TensorCore 。Work at the scheduling level: the less the better0 码力 | 26 页 | 5.82 MB | 5 月前3TVM Meetup Nov. 16th - Linaro
○ ONNX RuntimeArm platform support in TVM upstream IPs Target Hardware/Model Options Codegen CPU arm_cpu pixel2 (snapdragon 835), mate10/mate10pro (kirin 970), p20/p20pro (kirin 970) -target=arm64-linux-android working together with the members closely in an organized way ○ Arm - Cortex-A/Cortex-M/Neoverse CPU, Mali GPU, Ethos NPU ○ Qualcomm - Hexagon DSP, Adreno GPU ○ Hisilicon, Xilinx, NXP, TI, ST, Fujitsu0 码力 | 7 页 | 1.23 MB | 5 月前3亿联TVM部署
good performance gain by autotuning 3. TVM can support many kinds of hardware platform: Intel/arm CPU, Nividia/arm GPU, VTA…5 �������������� 1. Get a .log file from the autotvm on Ubuntu 2. Use the0 码力 | 6 页 | 1.96 MB | 5 月前3TVM: Where Are We Going
Runtime JIT compile accelerator micro code • Support heterogenous devices, 10x better than CPU on the same board. • Move hardware complexity to software HW-SW Blueprint for Flexible Deep Learning0 码力 | 31 页 | 22.64 MB | 5 月前3
共 7 条
- 1