GPU - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Go on GPU

## Go on GPU ## Changkun Ou changkun.de/s/gogpu GopherChina 2023 Session “Foundational Toolchains” 2023 June 10 ## Agenda - Basic knowledge for interacting with GPUs • Accelerate Go programs using • Conclusion and outlooks ## Agenda - Basic knowledge for interacting with GPUs o Motivation o GPU Driver and Standards Render and compute pipeline o Vulkan/Metal/DX12/OpenGL ☐ Accelerate Go programs programs using GPUs ☐ Challenges in Go when using GPUs ☐ Conclusion and outlooks ## Motivation of GPU Acceleration Improve system computation performance Increase amount of concurrency Processing large

0 码力 | 57 页 | 4.62 MB | 2 年前
3
GPU Resource Management On JDOS

## GPU Resource Management On JDOS 梁永清 liangyongqing1@jd.com ## 提供的服务 ## Experiment ## Training 1. 用于实验的 GPU 容器 2. 基于 Kubeflow 的机器学习训练服务 3. 模型管理和模型 Serving 服务 ## Serving 均基于容器，不对业务方直接提供 GPU 物理机物理机 ## GPU 实验 JDOS 常规的容器服务，使用 gpu 的 zone，自行设定相应的镜像即可，有完善的周边服务我的系统 ![Image](/uploads/documents/8/5/3/d/853d658ef8422c42cb997f278e0dedcd/p3_2.jpg) 三一键编译 ![Image](/uploads/documents/8/5/3/d/85 _4.jpg) public/tensor/now.1.4.1-ueve-gpu-vi ![Image](/uploads/documents/8/5/3/d/853d658ef8422c42cb997f278e0dedcd/p3_5.jpg) public/tensorflow:1.7.0-devel-gpu-py3-v1 ![Image](/uploads/documents/8/

0 码力 | 11 页 | 13.40 MB | 2 年前
3
Bridging the Gap: Writing Portable Programs for CPU and GPU

CPU and GPU ## THOMAS MEJSTRIK ## DIMETOR ![Image](/uploads/documents/e/0/4/9/e04984c6d792732e1852981d08548d37/p2_2.jpg) FWF ## Bridging the Gap: Writing Portable Programs for CPU and GPU using CUDA ROCm, Vulkan, ... ☐ You can tell me about afterwards ## Why write programs for CPU and GPU ## ☐ Difference CPU/GPU Algorithms are designed differently ☐ Latency/Throughput ☐ Memory bandwidth ☐ Number Problem ☐ Why it makes sense? ☐ Scope of the talk ## Why write programs for CPU and GPU ## ☐ Difference CPU/GPU ☐ Why it makes sense? Library/Framework developers ☐ Embarrassingly parallel algorithms

0 码力 | 124 页 | 4.10 MB | 1 年前
3
FFmpeg在Intel GPU上的硬件加速与优化

## FFmpeg在Intel GPU上的硬件加速与优化赵军 DCG/NPG @ Intel ## 介绍FFmpeg VAAPI • Media pipeline review • 何谓FFmpeg VAAPI • 为什么我们需要FFmpeg VAAPI • 当前状态 • 更进一步的计划 · 附录 ## 典型的 media pipeline SOURCE libavformat radeon, nouveau (?), freedreno, … • 废弃的 API bridges • vdpau—va bridge • powervr—va bridge ## I ntel GPU简介 ## • Gfx Label • Gen3: Pinetrail (Pineview) • Gen4: G965 • Gen5: G4X, Ironlake (Piketon, Calpella) OpenGL） ## I ntel GPU media 硬件编程模型 GPU ![Image](/uploads/documents/2/8/2/0/28202a07346de639ace133722cf64748/p9_2.jpg) ## FFmpeg & Intel GPU加速方案 - FFmpeg 作为最流行的开源多媒体框架; 集成Intel的GPU的硬件加速能为用户带来更多收益

0 码力 | 26 页 | 964.83 KB | 2 年前
3
激活函数与GPU加速

## PyTorch ## 激活函数与GPU加速主讲人：龙良曲 ![Image](/uploads/documents/a/1/2/3/a123d1e5f7cf442518ac7eb1e3f17c73/p2_1.jpg) ![Image](/uploads/documents/a/1/2/3/a123d1e5f7cf442518ac7eb1e3f17c73/p3_1.jpg) ![Ima \beta*x)) $$ ![Image](/uploads/documents/a/1/2/3/a123d1e5f7cf442518ac7eb1e3f17c73/p7_1.jpg) ## GPU accelerated ## ☐ ☐ ☐ device = torch.device('cuda:0') net = MLP().to(device) optimizer =

0 码力 | 11 页 | 452.22 KB | 2 年前
3
C++高性能并行编程与优化 - 课件 - 08 CUDA 开启的 GPU 编程

## CUDA 开启的 GPU 编程 by 彭于斌 (@archibate) 往期录播：https://www.bilibili.com/video/BV1fa411r7zp 课程 PPT 和代码：https://github.com/parallel101/course ## 前置条件 • 学过 C/C++ 语言编程。 - 理解 malloc/free 之类的概念。 • 熟悉 STL ## 编写一段在 GPU 上运行的代码 - 定义函数 kernel，前面加上 ___ global___ 修饰符，即可让他在 GPU 上执行。 - 不过调用 kernel 时，不能直接 kernel()，而是要用 kernel<<1, 1>>() 这样的三重尖括号语法。为什么？这里面的两个 1 有什么用？稍后会说明。 • 运行以后，就会在 GPU 上执行 printf kernel 函数在 GPU 上执行，称为核函数，用 ___ global___ 修饰的就是核函数。 ![Image](/uploads/documents/6/b/e/7/6be70db418434c4b3ebda53c2593beaa/p6_1.jpg) ## 没有反应？同步一下！ - 然而如果直接编译运行刚刚那段代码，是不会打印出Hello, world! 的。这是因为 GPU 和 CPU 之间的通信，为了高效，是异步的。也就是

0 码力 | 142 页 | 13.52 MB | 2 年前
3
4 Python机器学习性能优化

ents/7/1/6/5/71656c39f0055537d7f9feafcf0f03f1/p11_2.jpg) PYTHON 30th ## 2 了解你的资源 cpu/内存/io/gpu ## GPU为什么“快”？ ![Image](/uploads/documents/7/1/6/5/71656c39f0055537d7f9feafcf0f03f1/p12_2.jpg) ## 计算力对比 f03f1/p13_3.jpg) ## 摩尔定律的限制 • “集成电路上可容纳的晶体管数目，约每十八个月便会增加一倍” ## CPU更多用在了Cache(L1/L2/L3)和Control GPU绝大部分用来在了ALU计算单元 ![Image](/uploads/documents/7/1/6/5/71656c39f0055537d7f9feafcf0f03f1/p14_2.jpg) /p14_3.jpg) ![Image](/uploads/documents/7/1/6/5/71656c39f0055537d7f9feafcf0f03f1/p14_4.jpg) ## GPU特性 · SIMD · 显存分级 · 异构&异步 ![Image](/uploads/documents/7/1/6/5/71656c39f0055537d7f9feafcf0f03f1/p15_2

0 码力 | 38 页 | 2.25 MB | 2 年前
3
C++高性能并行编程与优化 - 课件 - 01 学 C++ 从 CMake 学起

C++ 5. C++11 起的多线程编程：从 mutex 到无锁并行 6. 并行编程常用框架：OpenMP 与 Intel TBB 7. 被忽视的访存优化：内存带宽与 cpu 缓存机制 8. GPU 专题：wrap 调度，共享内存，barrier 9. 并行算法实战：reduce，scan，矩阵乘法等 10. 存储大规模三维数据的关键：稀疏数据结构 11. 物理仿真实战：邻居搜索表实现 pbf 至少 2 核 4 线程（并行课...）英伟达家显卡（GPU 专题）软件要求： Visual Studio 2019 ( Windows 用户 ) GCC 9 及以上（Linux 用户） CMake 3.12 及以上（跨平台作业） Git 2.x（作业上传到 GitHub） CUDA Toolkit 10.0 以上（GPU 专题） ## I ❤️ C ## 关于作者 ![Im

0 码力 | 32 页 | 11.40 MB | 2 年前
3
运维上海2017－Kubernetes与AI相结合架构、落地解析-赵慧智

space) Control Group (Cpu, Memory, IO) Namespaces (pid, net, ipc, mnt, uts) CPU System RAM GPU Disk SCI (System Call Interface) (Kernel) Networking Infrastructure ## Container VS VM (Virtual Containerized Applications libs/bins Hypervisor Operating System Operating System CPU System RAM GPU Disk Networking ## Container Image 用来将需要容器化的应用程序及其环境进行打包后存储的镜像。 - 通常会有一个 Image 管理仓库来存储 Image。 while (tid < N) { c[tid] = a[tid] + b[tid]; tid += 2; } } ## 深度学习对于并行化硬件的依赖 - GPU ## • Core 的多少往往决定真正并行化运算的数量 TESLA M60 FEATURES AND BENEFITS > Two high-end NVIDIA Maxwell $ ^{TM}

0 码力 | 77 页 | 14.48 MB | 2 年前
3
深度学习与PyTorch入门实战 - 01. 初见PyTorch

PyTorch生态 ## PyTorch NLP ## AllenNLP TorchVision Fast.ai ONNX ## PyTorch能做什么？ • GPU加速自动求导 • 常用网络层 ### 1. GPU加速 ### 2. 自动求导 ### 3. 常用网络层 - nn.Linear - nn.Conv2d nn.LSTM ■ nn.ReLU - nn.Sigmoid

0 码力 | 19 页 | 1.06 MB | 2 年前
3

共 594 条前往

页

分类

语言

格式

Go on GPU

GPU Resource Management On JDOS

Bridging the Gap: Writing Portable Programs for CPU and GPU

FFmpeg在Intel GPU上的硬件加速与优化

激活函数与GPU加速

C++高性能并行编程与优化 - 课件 - 08 CUDA 开启的 GPU 编程

4 Python机器学习性能优化

C++高性能并行编程与优化 - 课件 - 01 学 C++ 从 CMake 学起

运维上海2017－Kubernetes与AI相结合架构、落地解析-赵慧智

深度学习与PyTorch入门实战 - 01. 初见PyTorch

搜索

分类

语言

格式