积分充值
 首页
前端开发
AngularDartElectronFlutterHTML/CSSJavaScriptReactSvelteTypeScriptVue.js构建工具
后端开发
.NetC#C++C语言DenoffmpegGoIdrisJavaJuliaKotlinLeanMakefilenimNode.jsPascalPHPPythonRISC-VRubyRustSwiftUML其它语言区块链开发测试微服务敏捷开发架构设计汇编语言
数据库
Apache DorisApache HBaseCassandraClickHouseFirebirdGreenplumMongoDBMySQLPieCloudDBPostgreSQLRedisSQLSQLiteTiDBVitess数据库中间件数据库工具数据库设计
系统运维
AndroidDevOpshttpdJenkinsLinuxPrometheusTraefikZabbix存储网络与安全
云计算&大数据
Apache APISIXApache FlinkApache KarafApache KyuubiApache OzonedaprDockerHadoopHarborIstioKubernetesOpenShiftPandasrancherRocketMQServerlessService MeshVirtualBoxVMWare云原生CNCF机器学习边缘计算
综合其他
BlenderGIMPKiCadKritaWeblate产品与服务人工智能亿图数据可视化版本控制笔试面试
文库资料
前端
AngularAnt DesignBabelBootstrapChart.jsCSS3EchartsElectronHighchartsHTML/CSSHTML5JavaScriptJerryScriptJestReactSassTypeScriptVue前端工具小程序
后端
.NETApacheC/C++C#CMakeCrystalDartDenoDjangoDubboErlangFastifyFlaskGinGoGoFrameGuzzleIrisJavaJuliaLispLLVMLuaMatplotlibMicronautnimNode.jsPerlPHPPythonQtRPCRubyRustR语言ScalaShellVlangwasmYewZephirZig算法
移动端
AndroidAPP工具FlutterFramework7HarmonyHippyIoniciOSkotlinNativeObject-CPWAReactSwiftuni-appWeex
数据库
ApacheArangoDBCassandraClickHouseCouchDBCrateDBDB2DocumentDBDorisDragonflyDBEdgeDBetcdFirebirdGaussDBGraphGreenPlumHStreamDBHugeGraphimmudbIndexedDBInfluxDBIoTDBKey-ValueKitDBLevelDBM3DBMatrixOneMilvusMongoDBMySQLNavicatNebulaNewSQLNoSQLOceanBaseOpenTSDBOracleOrientDBPostgreSQLPrestoDBQuestDBRedisRocksDBSequoiaDBServerSkytableSQLSQLiteTiDBTiKVTimescaleDBYugabyteDB关系型数据库数据库数据库ORM数据库中间件数据库工具时序数据库
云计算&大数据
ActiveMQAerakiAgentAlluxioAntreaApacheApache APISIXAPISIXBFEBitBookKeeperChaosChoerodonCiliumCloudStackConsulDaprDataEaseDC/OSDockerDrillDruidElasticJobElasticSearchEnvoyErdaFlinkFluentGrafanaHadoopHarborHelmHudiInLongKafkaKnativeKongKubeCubeKubeEdgeKubeflowKubeOperatorKubernetesKubeSphereKubeVelaKumaKylinLibcloudLinkerdLonghornMeiliSearchMeshNacosNATSOKDOpenOpenEBSOpenKruiseOpenPitrixOpenSearchOpenStackOpenTracingOzonePaddlePaddlePolicyPulsarPyTorchRainbondRancherRediSearchScikit-learnServerlessShardingSphereShenYuSparkStormSupersetXuperChainZadig云原生CNCF人工智能区块链数据挖掘机器学习深度学习算法工程边缘计算
UI&美工&设计
BlenderKritaSketchUI设计
网络&系统&运维
AnsibleApacheAWKCeleryCephCI/CDCurveDevOpsGoCDHAProxyIstioJenkinsJumpServerLinuxMacNginxOpenRestyPrometheusServertraefikTrafficUnixWindowsZabbixZipkin安全防护系统内核网络运维监控
综合其它
文章资讯
 上传文档  发布文章  登录账户
IT文库
  • 综合
  • 文档
  • 文章

无数据

分类

全部后端开发(69)C++(32)Julia(18)综合其他(12)人工智能(11)前端开发(6)云计算&大数据(6)VirtualBox(3)Rust(3)Java(2)

语言

全部英语(43)中文(繁体)(23)中文(简体)(20)zh(4)[zh](1)fj(1)日语(1)

格式

全部PDF文档 PDF(89)PPT文档 PPT(4)
 
本次搜索耗时 0.027 秒,为您找到相关结果约 93 个.
  • 全部
  • 后端开发
  • C++
  • Julia
  • 综合其他
  • 人工智能
  • 前端开发
  • 云计算&大数据
  • VirtualBox
  • Rust
  • Java
  • 全部
  • 英语
  • 中文(繁体)
  • 中文(简体)
  • zh
  • [zh]
  • fj
  • 日语
  • 全部
  • PDF文档 PDF
  • PPT文档 PPT
  • 默认排序
  • 最新排序
  • 页数排序
  • 大小排序
  • 全部时间
  • 最近一天
  • 最近一周
  • 最近一个月
  • 最近三个月
  • 最近半年
  • 最近一年
  • pdf文档 Bridging the Gap: Writing Portable Programs for CPU and GPU

    1/66Bridging the Gap: Writing Portable Programs for CPU and GPU using CUDA Thomas Mejstrik Sebastian Woblistin 2/66Content 1 Motivation Audience etc.. Cuda crash course Quiz time 2 Patterns Oldschool Motivation Patterns The dark path Cuda proposal Thank you Why write programs for CPU and GPU Difference CPU/GPU Algorithms are designed differently Latency/Throughput Memory bandwidth Number of cores Motivation Patterns The dark path Cuda proposal Thank you Why write programs for CPU and GPU Difference CPU/GPU Why it makes sense? Library/Framework developers Embarrassingly parallel algorithms User
    0 码力 | 124 页 | 4.10 MB | 6 月前
    3
  • pdf文档 POCOAS in C++: A Portable Abstraction for Distributed Data Structures

    CPU vFast GPU vvFast PCI Bus (or other fabric)GPUs as a First-Class Computing Resource CPU GPU PCI Bus (or other fabric) NIC - Historically, network comm. was CPU-centric 1) Direct GPU access to Infiniband allows GPU-to-GPU network transfers 2) Fast in-node fabrics like NVLink, Infinity Fabric allow very fast intra-node transfers DataGPUs as a First-Class Computing Resource CPU GPU PCI Bus (or fabric) NIC Data - Historically, network comm. was CPU-centric 1) Direct GPU access to Infiniband allows GPU-to-GPU network transfers 2) Fast in-node fabrics like NVLink, Infinity Fabric allow
    0 码力 | 128 页 | 2.03 MB | 6 月前
    3
  • pdf文档 Taro: Task graph-based Asynchronous Programming Using C++ Coroutine

    B" : GPU operation 9Existing TGPSs on Heterogenous Computing - Challenge A C D B! B" 5 task_b = sched.emplace([](&){ 6 // CPU code; // GPU code; 7 }); // CPU thread blocks until GPU finishes B" : GPU operation 10Existing TGPSs on Heterogenous Computing - Challenge A C D B! B" 5 task_b = sched.emplace([](&){ 6 // CPU code; // GPU code; 7 }); // CPU thread blocks until GPU finishes operation B" : GPU operation Atomic execution per task 11Existing TGPSs on Heterogenous Computing - Challenge CPU A B! C Idle GPU D B" Runtime A C D B! B" Assume one CPU and one GPU B! : CPU operation
    0 码力 | 84 页 | 8.82 MB | 6 月前
    3
  • pdf文档 Heterogeneous Modern C++ with SYCL 2020

    http://wongmichael.com/about ● C++11 book in Chinese: https://www.amazon.cn/dp/B00ETOV2OQ We build GPU compilers for some of the most powerful supercomputers in the world 34 Nevin “:-)” Liber nliber@anl Attribution 4.0 International License SYCL Single Source C++ Parallel Programming GPU FPGA DSP Custom Hardware GPU CPU CPU CPU Standard C++ Application Code C++ Libraries ML Frameworks give better performance on complex apps and libs than hand-coding AI/Tensor HW GPU FPGA DSP Custom Hardware GPU CPU CPU CPU AI/Tensor HW Other BackendsSYCL 2020 is here! Open Standard for
    0 码力 | 114 页 | 7.94 MB | 6 月前
    3
  • ppt文档 Bringing Existing Code to CUDA Using constexpr and std::pmr

    cudaFree(x); cudaFree(y); } An Even Easier Introduction to CUDA 5 |__global__ void add_gpu(int n, float* x, float* y) { for (int i = 0; i < n; i++) y[i] = x[i] + y[i]; } TEST_CASE("cppcon-1" TEST_CASE("cppcon-1", "[CUDA]") { // … } An Even Easier Introduction to CUDA 6 |__global__ void add_gpu(int n, float* x, float* y) { for (int i = 0; i < n; i++) y[i] = x[i] + y[i]; } TEST_CASE("cppcon-1" 20; float* x; float* y; // … add_gpu<<<1, 1>>>(N, x, y); // … } An Even Easier Introduction to CUDA 7 |__global__ void add_gpu(int n, float* x, float* y) { for (int i = 0;
    0 码力 | 51 页 | 3.68 MB | 6 月前
    3
  • pdf文档 2024 中国开源开发者报告

    MiniMax 等。  其次是由 TogetherAI、Groq、Fireworks、Replicate、硅基流动等组成的 GPU 推理集群 服务提供商,它们处理扩展与缩减等技术难题,并在基本计算费用基础上收取额外费用,从 而让应用公司无需承担构建和管理 GPU 推理集群的高昂成本,而是可以直接利用抽象化的 AI 基础设施服务。  第三类是传统的云计算平台,例如亚马逊的 Amazon Vertex AI 等,允许应用开发者轻松部署和使用标准化或定制化的 AI 模型, 并通过 API 接口调用这些模型。  最后一类是本地推理,SGLang、vLLM、TensorRT-LLM 在生产级 GPU 服务负载中表现 出色,受到许多有本地托管模型需求的应用开发者的欢迎,此外,Ollama 和 LM Studio 也 是在个人计算机上运行模型的优选方案。 62 / 111 除模型层面外,应 软件,例如:微控制处理器(MCU)会运行实时操作系统或者直接运行某个特定程序;中央处 理器(CPU)往往会运行 Windows、Linux 等复杂操作系统作为底座支撑整个软件栈;图形 处理器(GPU)一般不加载操作系统而是直接运行图形图像处理程序,神经网络处理器 (NPU)则直接运行深度学习相关程序。 处理器芯片设计是一项很复杂的任务,整个过程犹如一座冰山。冰山水面上是用户或者大 众看到
    0 码力 | 111 页 | 11.44 MB | 8 月前
    3
  • pdf文档 Distributed Ranges: A Model for Building Distributed Data Structures, Algorithms, and Views

    involve experimental prototypes and early research.Problem: writing parallel programs is hard - Multi-GPU, multi-CPU systems require partitioning data - Users must manually split up data amongst GPUs / execution necessary. CPU NIC GPU GPU GPU GPU Xe LinkMulti-GPU Systems - NUMA regions: - 4+ GPUs - 2+ CPUs CPU NIC GPU GPU GPU GPU Xe LinkMulti-GPU Systems - NUMA regions: - 4+ GPUs more memory domains - Software needed to reduce complexity CPU NIC GPU Tile 1 Tile 0 GPU Tile 1 Tile 0 GPU Tile 1 Tile 0 GPU Tile 1 Tile 0 Xe LinkProject Goals - Offer high-level, standard C++
    0 码力 | 127 页 | 2.06 MB | 6 月前
    3
  • pdf文档 Deepseek R1 本地部署完全手册

    数 Windows 配置要求 Mac 配置要求 适⽤场景 1.5B - RAM: 4GB - GPU: 集成显卡/现代CPU - 存储: 5GB - 内存: 8GB (M1/M2/M3) - 存储: 5GB 简单⽂本⽣成、基础代 码补全 7B - RAM: 8-10GB - GPU: GTX 1680(4-bit量 化) - 存储: 8GB - 内存: 16GB(M2 Pro/M3) Pro/M3) - 存储: 8GB 中等复杂度问答、代码 调试 14B - RAM: 24GB - GPU: RTX 3090(24GB VRAM) - 存储: 20GB - 内存: 32GB(M3 Max) - 存储: 20GB 复杂推理、技术⽂档⽣ 成 32B+ 企业级部署(需多卡并联) 暂不⽀持 科研计算、⼤规模数据 处理 2. 算⼒需求分析 模型 参数规 模 2*XE9680(16*H20 GPU) DeepSeek-R1-Distill- 70B 70B BF16 ≥180GB 4*L20 或 2*H20 GPU 三、国产芯⽚与硬件适配⽅案 1. 国内⽣态合作伙伴动态 企业 适配内容 性能对标(vs NVIDIA) 华为昇 腾 昇腾910B原⽣⽀持R1全系列,提供端到端推理优化 ⽅案 等效A100(FP16) 沐曦 GPU MXN系列⽀持70B模型BF16推理,显存利⽤率提升
    0 码力 | 7 页 | 932.77 KB | 8 月前
    3
  • pdf文档 Trends Artificial Intelligence

    Impressive61 NVIDIA AI Ecosystem Tells Over Four Years = >100% Growth in Developers / Startups / Apps Note: GPU = Graphics Processing Unit. Source: NVIDIA (2021 & 2025) NVIDIA Computing Ecosystem – 2021-2025, per Cloud vs. AI Patterns105 Tech CapEx Spend Partial Instigator = Material Improvements in GPU PerformanceNVIDIA GPU Performance = +225x Over Eight Years 106 1 GPT-MoE Inference Workload = A type of workload Source: NVIDIA (5/25) Performance of NVIDIA GPU Series Over Time – 2016-2024, per NVIDIA Tech CapEx Spend Partial Instigator = Material Improvements in GPU Performance Pascal Volta Ampere Hopper Blackwell
    0 码力 | 340 页 | 12.14 MB | 5 月前
    3
  • pdf文档 AnEditor Can Do That?

    support 3. ARM and ARM64 support (Raspberry Pi, Surface Pro X, Apple Silicon) 4. CUDA IntelliSense and GPU debuggingVisual Studio Code What’s new? 1. GitHub Codespaces (coding from your browser!) 2. CMake support 3. ARM and ARM64 support (Raspberry Pi, Surface Pro X, Apple Silicon) 4. CUDA IntelliSense and GPU debugging 5. Disassembly View while debugging Preview!Visual Studio Code What’s new? 1. GitHub Codespaces support 3. ARM and ARM64 support (Raspberry Pi, Surface Pro X, Apple Silicon) 4. CUDA IntelliSense and GPU debugging 5. Disassembly View while debugging Preview!Visual Studio Code What’s new? 1. GitHub Codespaces
    0 码力 | 71 页 | 2.53 MB | 6 月前
    3
共 93 条
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 10
前往
页
相关搜索词
BridgingtheGapWritingPortableProgramsforCPUandGPUPOCOASinC++AbstractionDistributedDataStructuresTaroTaskgraphbasedAsynchronousProgrammingUsingCoroutineHeterogeneousModernwithSYCL2020BringingExistingCodetoCUDAconstexprstdpmr2024中国开源开发开发者报告RangesModelBuildingAlgorithmsViewsDeepseekR1本地部署完全手册TrendsArtificialIntelligenceAnEditorCanDoThat
IT文库
关于我们 文库协议 联系我们 意见反馈 免责声明
本站文档数据由用户上传或本站整理自互联网,不以营利为目的,供所有人免费下载和学习使用。如侵犯您的权益,请联系我们进行删除。
IT文库 ©1024 - 2025 | 站点地图
Powered By MOREDOC AI v3.3.0-beta.70
  • 关注我们的公众号【刻舟求荐】,给您不一样的精彩
    关注我们的公众号【刻舟求荐】,给您不一样的精彩