CPU overhead - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

How and When You Should Measure CPU Overhead of eBPF Programs

## How and When You Should Measure CPU Overhead of eBPF Programs eBPF Summit ## Why should I profile eBPF programs? ## CI variance tracking ●●●● name TCPLatency/eBPF/kprobe/sys_bind TCPLatency/eB Benchmarking + CI/CD – Sampling profiler in production ## How does it work? ## - Adds ~20ns of overhead per run ☐ ☐ ☐ // pseudo-code if (bpf_stats_enabled) { u64 start = sched_clock(); run_ebpf_program();

0 码力 | 20 页 | 2.04 MB | 2 年前
3
Hidden Overhead of a Function API

## +24 ## Hidden Overhead of a Function API ## OLEKSANDR BACHERIKOV ## What we do at Snap with C++ ![Image](/uploads/documents/9/f/4/6/9f468d285d830d08a1586b1ba74a3d5d/p2_1.jpg) Neural style transfer modern CPU instruction cache. As a result, the hardware spends a considerable amount of processing time — nearly 30 percent, in many cases — getting an instruction stream from memory to the CPU.” ## Disclaimer: rbxldp x29, x30, [sp], #32ret 0 ## Negative-overhead abstraction! ## C++ Core Guidelines F.20: For “out” output values, prefer return values to output

0 码力 | 158 页 | 2.46 MB | 1 年前
3
2.1.3 如何用Go模拟CPU

## GCN ## 如何用Go模拟CPU ![Image](/uploads/documents/4/e/2/e/4e2e70c1718d140b661b47b6a7e8d2d2/p1_1.jpg) 蒙卓华为－2012实验室工程师 ## 成为盘古？让这个世界里面的人（程序）无法察觉这个世界是创造出来的 ## 目录 • 计算机的演化历史 - 硬件计算到冯诺伊曼架构 • 构建虚拟世界 • 6502汇编器与链接器 • 未来目标 1970年程序员 CPU 80KHz 单核内存 64KB 手编磁芯 ![Image](/uploads/documents/4/e/2/e/4e2e70c1718d140b661b47b6a7e8d2d2/p4_1.jpg) 老娘把你送上月球 2021年程序员 CPU 2,400,000KHz 4核内存 8,000,000KB DDR3 为啥现在程序员好像更弱了？ · 因为我们处在最好也是最坏的时代 • 抽象多且环环嵌套 • 硬件过于复杂 • 软件基于操作系统等复杂概念 · 真的快且便宜 ## Go模拟CPU • 如何用Go实现冯诺伊曼架构CPU? • 简单：一个循环+一个大数组读取当前指令执行指令下一条指令 ## 模拟目标 - MOS 6502 • 诞生于1975年 • MOS 6502应用范围广 · 资料多且易获得

0 码力 | 42 页 | 7.10 MB | 2 年前
3
Bridging the Gap: Writing Portable Programs for CPU and GPU

Programs for CPU and GPU ## THOMAS MEJSTRIK ## DIMETOR ![Image](/uploads/documents/e/0/4/9/e04984c6d792732e1852981d08548d37/p2_2.jpg) FWF ## Bridging the Gap: Writing Portable Programs for CPU and GPU SYCL, ROCm, Vulkan, ... ☐ You can tell me about afterwards ## Why write programs for CPU and GPU ## ☐ Difference CPU/GPU Algorithms are designed differently ☐ Latency/Throughput ☐ Memory bandwidth ☐ radar” - Problem ☐ Why it makes sense? ☐ Scope of the talk ## Why write programs for CPU and GPU ## ☐ Difference CPU/GPU ☐ Why it makes sense? Library/Framework developers ☐ Embarrassingly parallel

0 码力 | 124 页 | 4.10 MB | 1 年前
3
Designing an ultra low-overhead multithreading runtime for Nim

## Designing an ultra low-overhead multithreading runtime for Nim Mamy Ratsimbazafy mamy@numforge.co ## Hello! ## I am Mamy Ratsimbazafy During the day blockchain/Ethereum 2 developer (in Nim) During Sources of overhead and runtime design Minimum viable runtime plan in a weekend ## Understanding the design space Concurrency vs parallelism, latency vs throughput Cooperative vs preemptive, IO vs CPU ## Parallelism - Atomics Transactional memory - Message-passing ## I O-tasks vs CPU-tasks ## I O-tasks: Latency optimized - async/await ## CPU-tasks: Throughput optimized - spawn/sync Doing both in the same

0 码力 | 37 页 | 556.64 KB | 2 年前
3
Is std::mdspan a Zero-overhead Abstraction? - Oleksandr Bakirov - CppCon

## +23 ## I s std::mdspan a Zero-overhead Abstraction? ## OLEKSANDR BACHERIKOV ## I s std::mdspan a Zero-overhead Abstraction? Oleksandr Bacherikov Snap Inc ## What is std::mdspan? It's a view Wrong! std::layout_stride supports only all strides specified at runtime. If we target zero overhead, we have to specify one of the strides as 1 at compile time. What does the Standard offer us instead

0 码力 | 75 页 | 1.04 MB | 1 年前
3
Making Games Start Fast: A Story About Concurrency

cd2064a1322/p12_1.jpg) 2.7 (Old) Startup CPU Usage ![Image](/uploads/documents/7/9/3/f/793f1544c860110a4e5decd2064a1322/p13_1.jpg) 2.8 (New) Startup CPU Usage ## Startup Breakdown Enumerate asset 60110a4e5decd2064a1322/p17_1.jpg) ## High CPU Time Single threaded code Inefficient algorithms Branch misprediction, cache misses Spin locks ## High CPU Time Single threaded code Inefficient algorithms rouping:Function / Call StackFunction / Call StackCPU TimeWait Time by Utilization ▼Wait CountModule
0 码力 | 76 页 | 2.22 MB | 1 年前
3
Accelerate Istio-CNI with ebpf

Accelerate Istio-CNI with ebpf Xu Yizhou & Guo Ruijing ## Agenda • Istio-CNI • tcp/ip stack overhead between sidecar and service • Background knowledge of ebpf • Acceleration for Inbound/Outbound/Envoy [Image](/uploads/documents/5/a/b/b/5abb1b8f1b8f9d74adba9f84c56cea7a/p3_1.jpg) ## Tcp/ip stack overhead between sidecar and service Overhead sidecar traffic from 3 scopes • Inbound • Outbound • Envoy to Envoy(same host)

0 码力 | 15 页 | 658.90 KB | 2 年前
3
Optimizing Away Virtual Functions May Be Pointless

technical details and surprising conclusions that virtual functions can be actually faster. Since CPU architectures are mentioned, I'd expect to see deep assembly profiling. ## Ok, some assembly is But I have another computer ## Different CPUs ## Laptop: Model name: Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz Thread(s) per core: 2 Core(s) per socket: 4 Stepping: 12 ## Desktop: Thread(s) per core: /9/2/1092c89fc888067fdbc59ca7369237f9/p14_1.jpg) ## Conclusions ## Relevant factors • CPU manufacturer • CPU version • Precise code path • Temperature(?) • OS interrupts(?) - Compiler optimization

0 码力 | 20 页 | 1.19 MB | 1 年前
3
TVM@AliOS

TVM@AliOS ## PRESENTATION AGENDA ☑ TVM @ AliOS Overview TVM @ AliOS ARM CPU TVM @ AliOS Hexagon DSP TVM @ AliOS Intel GPU ☑ Misc ## PART ONE TVM @ AliOS Overview ## AliOS Overview • AliOS (www.alios 驱动万物智能 ## PART TWO AliOS TVM @ ARM CPU ## AliOS TVM@ARM CPU • Support TFLite (Open Source and Upstream Master) • Optimize on INT8 & FP32 ## AliOS TVM @ ARM CPU INT8 Convolution • NHWC layout • AliOS TVM @ ARM CPU INT8 TVM / QNNPACK Speed Up @ Mobilenet V2 @ rasp 3b+ AARCH64 ![Image](/uploads/documents/9/0/e/a/90eab7a9909eddc3e1f4b253cda18ef6/p10_1.jpg) ## AliOS TVM @ ARM CPU INT8 Depthwise

0 码力 | 27 页 | 4.86 MB | 1 年前
3

共 1000 条前往

页

分类

语言

格式

How and When You Should Measure CPU Overhead of eBPF Programs

Hidden Overhead of a Function API

2.1.3 如何用Go模拟CPU

Bridging the Gap: Writing Portable Programs for CPU and GPU

Designing an ultra low-overhead multithreading runtime for Nim

Is std::mdspan a Zero-overhead Abstraction? - Oleksandr Bakirov - CppCon

Making Games Start Fast: A Story About Concurrency

Accelerate Istio-CNI with ebpf

Optimizing Away Virtual Functions May Be Pointless

TVM@AliOS

搜索

分类

语言

格式