PyTorch Release Notesincluding Python 3.10 ‣ NVIDIA CUDA® 12.1.1 ‣ NVIDIA cuBLAS 12.1.3.1 ‣ NVIDIA cuDNN 8.9.3 ‣ NVIDIA NCCL 2.18.3 ‣ NVIDIA RAPIDS™ 23.06 ‣ Apex ‣ rdma-core 39.0 ‣ NVIDIA HPC-X 2.15 ‣ OpenMPI 4.1.4+ ‣ including Python 3.10 ‣ NVIDIA CUDA® 12.1.1 ‣ NVIDIA cuBLAS 12.1.3.1 ‣ NVIDIA cuDNN 8.9.2 ‣ NVIDIA NCCL 2.18.1 ‣ NVIDIA RAPIDS™ 23.04 ‣ Apex ‣ rdma-core 39.0 ‣ NVIDIA HPC-X 2.15 ‣ OpenMPI 4.1.4+ ‣ including Python 3.10 ‣ NVIDIA CUDA® 12.1.1 ‣ NVIDIA cuBLAS 12.1.3.1 ‣ NVIDIA cuDNN 8.9.1.23 ‣ NVIDIA NCCL 2.18.1 ‣ NVIDIA RAPIDS™ 23.04 ‣ Apex ‣ rdma-core 36.0 ‣ NVIDIA HPC-X 2.14 ‣ OpenMPI 4.1.4+ ‣0 码力 | 365 页 | 2.94 MB | 1 年前3
动手学深度学习 v2.0路提供高达300Gbit/s的数据传输速 率。服务器GPU(Volta V100)有六个链路。而消费级GPU(RTX 2080Ti)只有一个链路,运行速度也 降低到100Gbit/s。建议使用NCCL162来实现GPU之间的高速数据传输。 12.4.7 更多延迟 表12.4.1和 表12.4.2中的小结来自Eliot Eshelman163,他们将数字的更新版本保存到GitHub gist164。 s/pcie‐switches 161 https://aws.amazon.com/ec2/instance‐types/p2/ 162 https://github.com/NVIDIA/nccl 163 https://gist.github.com/eshelman 164 https://gist.github.com/eshelman/343a1c46cb3fba142c1afdcdeec176460 码力 | 797 页 | 29.45 MB | 1 年前3
共 2 条
- 1













