TVM: Where Are We GoingHigh-level data flow graph and optimizations Directly generate optimized program for new operator workloads and hardware Hardware FrameworksWhy Automation is the Future Clear winner on emerging models BatchMatMul CuDNN w/ TensorCores tvm w/ TensorCores 1.4x better on emerging workloads Transformer related workloads Credit: Siyuan FengWhere are we goingUnified Runtime For Heterogeneous Devices remote_mod[“npufunction0"] func(remote_a, remote_b)Virtual Machine: Supporting Dynamic Workload Dynamic shape workloads More runtime objects: Arrays, Tuples, Trees, ADTs Minimum runtime for dynamic models Credit:0 码力 | 31 页 | 22.64 MB | 6 月前3
Facebook -- TVM AWS Meetup Talkelsewhere- Performance matters a lot - Heterogenous computing environment - High variety of workloads - Ever-increasing set of primitives (over 500 aten kernels) - Interpreter methods not delivering transcendentals (exp, tanh, erf, etc) - very general technique, allows clean vectorization - Related work in Gibiansky (2017), Gray (2019), et al. Image from OpenAI- Add relay.nn.sparse_dense for block-sparse0 码力 | 11 页 | 3.08 MB | 6 月前3
共 2 条
- 1













