cuDNN - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

PyTorch Release Notes

‣ Ubuntu 22.04 including Python 3.10 ‣ NVIDIA CUDA® 12.1.1 ‣ NVIDIA cuBLAS 12.1.3.1 ‣ NVIDIA cuDNN 8.9.3 ‣ NVIDIA NCCL 2.18.3 ‣ NVIDIA RAPIDS™ 23.06 ‣ Apex ‣ rdma-core 39.0 ‣ NVIDIA HPC-X 2.15 ‣ Ubuntu 22.04 including Python 3.10 ‣ NVIDIA CUDA® 12.1.1 ‣ NVIDIA cuBLAS 12.1.3.1 ‣ NVIDIA cuDNN 8.9.2 ‣ NVIDIA NCCL 2.18.1 ‣ NVIDIA RAPIDS™ 23.04 ‣ Apex ‣ rdma-core 39.0 ‣ NVIDIA HPC-X 2.15 ‣ Ubuntu 22.04 including Python 3.10 ‣ NVIDIA CUDA® 12.1.1 ‣ NVIDIA cuBLAS 12.1.3.1 ‣ NVIDIA cuDNN 8.9.1.23 ‣ NVIDIA NCCL 2.18.1 ‣ NVIDIA RAPIDS™ 23.04 ‣ Apex ‣ rdma-core 36.0 ‣ NVIDIA HPC-X 2

0 码力 | 365 页 | 2.94 MB | 1 年前
3
TVM: Where Are We Going

Deep Learning Landscape Frameworks and Inference engines DL Compilers Kenrel Libraries Hardware CuDNN NNPack MKL-DNN Hand optimized Open source, automated end-to- end optimization framework for Primitive Tensor operators such as Conv2D eg. cuDNN Offload to heavily optimized DNN operator library FrameworksLimitations of Existing Approach cuDNN Frameworks New operator introduced by is the Future 1 1 1 1 0.76 0.83 1.16 1.44 Large MatMul BatchConv Small MatMul BatchMatMul CuDNN w/ TensorCores tvm w/ TensorCores 1.4x better on emerging workloads Transformer related workloads

0 码力 | 31 页 | 22.64 MB | 6 月前
3
2 使用Python训练和部署低精度模型张校捷

大小为16x 如果FP32要使用，可以设置（内部转为FP16）： TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=1 TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=1 TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=1 TensorFlow手动转换模型 import tensorflow as tf import numpy

0 码力 | 24 页 | 981.45 KB | 1 年前
3
Keras: 基于 Python 的深度学习库

CNTK。我们推荐 TensorFlow 后端。 • TensorFlow 安装指引。 • Theano 安装指引。 • CNTK 安装指引。你也可以考虑安装以下可选依赖： • cuDNN (如果你计划在 GPU 上运行 Keras，建议安装)。 • HDF5 和 h5py (如果你需要将 Keras 模型保存到磁盘，则需要这些)。 • graphviz 和 pydot (用于可视化工具绘制模型图)。 RNN，但它往往会占用更多的内存。展开只适用于短序列。 • reset_after: GRU 公约 (是否在矩阵乘法之前或者之后使用重置门)。False =「之前」(默认)， Ture =「之后」( CuDNN 兼容)。参考文献 • Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Transla- recurrent_constraint=None, bias_constraint=None, return_sequences=False, return_state=False, stateful=False) 由 CuDNN 支持的快速 GRU 实现。只能以 TensorFlow 后端运行在 GPU 上。参数 • units: 正整数，输出空间的维度。 • kernel_initializer: kernel

0 码力 | 257 页 | 1.19 MB | 1 年前
3
《Efficient Deep Learning Book》[EDL] Chapter 3 - Learning Techniques

Epoch 1/100 2021-11-09 14:44:20.431426: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8005 32/32 [==============================] - 366s 12s/step - loss: 0.6981 - accuracy: 0 Epoch 1/100 2021-11-09 15:38:34.694059: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8005 63/63 [==============================] - 380s 6s/step - loss: 0.6932 - accuracy: 0

0 码力 | 56 页 | 18.93 MB | 1 年前
3
2022年美团技术年货合辑

卷积的一种可重参数化的结构（融合过程如下图 3 所示）。通过融合成的 3x3 卷积结构，可以有效利用计算密集型硬件计算能力（比如 GPU），同时也可获得 GPU/CPU 上已经高度优化的 NVIDIA cuDNN 和 Intel MKL 编译框架的帮助。算法 < 5 实验表明，通过上述策略，YOLOv6 减少了在硬件上的延时，并显著提升了算法的精度，让检测网络更快更强。以 nano 尺寸模型为例，对比 kernel 调整，选择不同的优化策略和计算方式，寻找适合当前的最优计算方式，以保证当前模型在特定平台上获得最优的性能。上图是优化主要思想，每一个 op 会有多种 kernel 优化策略（cuDNN、 cuBLAS 等），根据当前架构从所有优化策略中过滤低效 kernel，同时选择最优 kernel，最终形成新的 Network。 2. 手工优化：众所周知，GPU 适合计算密集型的算子，对于其他类型算子（轻量为 8.1 TFLOPS，具有极强的推理性能。在 TensorFlow 中，可利用 cuBLAS[9] 调后端 < 1143 用 Tensor Core 进行 GEMM 加速计算，利用 cuDNN[10] 调用 Tensor Core 进行 CNN、RNN 网络加速计算。 5.3 基于 DL 编译器的自动优化随着深度学习网络越来越复杂（Wider And Deeper），硬件设备越来越多样（CPU、

0 码力 | 1356 页 | 45.90 MB | 1 年前
3
VMware Greenplum v6.18 Documentation

hosts can benefit from GPU acceleration. GPUs and deep learning libraries such as Keras, TensorFlow, cudNN, and CUDA are managed separately from MADlib. For more information see the MADlib wiki instructions

0 码力 | 1959 页 | 19.73 MB | 1 年前
3
VMware Greenplum v6.19 Documentation

hosts can benefit from GPU acceleration. GPUs and deep learning libraries such as Keras, TensorFlow, cudNN, and CUDA are managed separately from MADlib. For more information see the MADlib wiki instructions

0 码力 | 1972 页 | 20.05 MB | 1 年前
3
VMware Greenplum v6.17 Documentation

hosts can benefit from GPU acceleration. GPUs and deep learning libraries such as Keras, TensorFlow, cudNN, and CUDA are managed separately from MADlib. For more information see the MADlib wiki instructions

0 码力 | 1893 页 | 17.62 MB | 1 年前
3
VMware Tanzu Greenplum v6.20 Documentation

hosts can benefit from GPU acceleration. GPUs and deep learning libraries such as Keras, TensorFlow, cudNN, and CUDA are managed separately from MADlib. For more information see the MADlib wiki instructions

0 码力 | 1988 页 | 20.25 MB | 1 年前
3

共 17 条前往

页

分类

语言

格式

PyTorch Release Notes

TVM: Where Are We Going

2 使用Python训练和部署低精度模型张校捷

Keras: 基于 Python 的深度学习库

《Efficient Deep Learning Book》[EDL] Chapter 3 - Learning Techniques

2022年美团技术年货合辑

VMware Greenplum v6.18 Documentation

VMware Greenplum v6.19 Documentation

VMware Greenplum v6.17 Documentation

VMware Tanzu Greenplum v6.20 Documentation