Remote Execution Caching Compiler (RECC)Remote Execution Caching Compiler (RECC) CppCon 2024 September 19, 2024 Shivam Bairoliya Software Engineer© 2024 Bloomberg Finance L.P. All rights reserved. What is RECC? ● Remote Execution Caching source build tool that wraps compiler commands and optionally forwards them to a remote build execution service ○ Encompasses the capabilities of both ccache and distcc ○ Supports remote linking and CC) ○ Supports multiple operating systems (Linux, macOS, Solaris) ● Compatible with any remote execution API server supported by Bazel ○ Single Host Server/Proxy: BuildBox-CASD ○ Distributed Server:0 码力 | 6 页 | 2.03 MB | 6 月前3
simd: How to Express Inherent Parallelism Efficiently Via Data-Parallel Typesr = pixel.g = pixel.b = gray; 5 } 6 } or 1 void to_gray(Image& img) { 2 std::for_each (std::execution::unseq, img.begin(), img.end(), [](Pixel& pixel) { 3 const auto gray = (pixel.r * 11 + pixel.g r = pixel.g = pixel.b = gray; 5 } 6 } or 1 void to_gray(Image& img) { 2 std::for_each (std::execution::unseq, img.begin(), img.end(), [](Pixel& pixel) { 3 const auto gray = (pixel.r * 11 + pixel.g over an image, producing vectorized access to pixels 2 Operations on vectorized pixels • Ideally, the first part is generalized to: Iteration over a range, providing vectorized access to its elements.0 码力 | 160 页 | 8.82 MB | 6 月前3
Performance Engineering: Being Friendly to Your HardwareROM Cache L1I • ABI registers implement a SW contract • They do not correspond to the actual execution • Conversion of control flow to a variant of data flow • Really complex • Some operations end of operations are equalExecution 54 Branching Fetch Decode Queue Allocation Scheduling Execution ROM Cache L1I LSQ • Multiple specialized functional units • Performs the actual (eventually) • Latency vs throughputRetirement 55 Branching Fetch Decode Queue Allocation Scheduling Execution ROM Cache L1I L1D LSQ Retirement • All operations until now are speculative • Except some0 码力 | 111 页 | 2.23 MB | 6 月前3
C++20: An (Almost) Complete Overviewthe caller, and suspends the coroutine, subsequently calling the coroutine again continues its execution co_return: returns from a coroutine (just return is not allowed)16 Coroutines What are coroutines Can be used with condition_variable_any std::stop_source Used to request a thread to stop execution Stop requests are visible to all associated stop_sources and stop_tokens std::stop_callback numbers lerp() to do linear interpolation New unsequenced_policy (execution::unseq): algorithm is allowed to be vectorized84 Agenda Modules Ranges Coroutines Concepts Lambda Expression0 码力 | 85 页 | 512.18 KB | 6 月前3
Code Generation from Unified Robot Description Format for Accelerated Roboticsneighborlookup and forward kinematics, and collision detection Motivation Motions in Microseconds via Vectorized Sampling-Based Planning Wil Thomasont, Zachary Kingstonit, and Lydia E. Kavraki Performance improvements Description Format (URDF) files and generates optimized code Setup data structure to optimize SIMD execution Skip unneeded computations like combiningfixed joints, unrolling loops Motivation Software written0 码力 | 93 页 | 9.29 MB | 6 月前3
Vectorizing a CFD Code With std::simd Supplemented by Transparent Loading and Storingdeduction to load and store std::simd and scalar variables ▪ syntactically equalize scalar and vectorized code The talk: ▪ share experience with vectorization using std::simd ▪ introduce the SIMD_ACCESS v : (0, size * size) for c : (i|j|k + 1, size) compute(point(...), point(...)); Loop to be vectorized • invariant nested loop length • sufficiently large loop length (size maybe rather small) Main version → Loaded types deduced by SIMD index type, deduction propagates forward to operators Vectorized loop body with overloaded operator[](simd_index): ▪ Same code for scalar and vector version ▪0 码力 | 58 页 | 2.68 MB | 6 月前3
Data Is All You Need for Fusionk_inc); macro_kernel_k18(a_buffer,b_buffer,m_inc,n_inc,k_inc,&C(m_count,n_count),LDC); } 2 Tiled Vectorized 4int x = 4; callee(x); // do work } #include#include #include "benchmark k_inc); macro_kernel_k18(a_buffer,b_buffer,m_inc,n_inc,k_inc,&C(m_count,n_count),LDC); } 2 Tiled Vectorized Parallelized 4int x = 4; callee(x); // do work } #include #include #include k_inc); macro_kernel_k18(a_buffer,b_buffer,m_inc,n_inc,k_inc,&C(m_count,n_count),LDC); } 2 Tiled Vectorized Parallelized Cache Blocked 4int x = 4; callee(x); // do work } #include #include 0 码力 | 151 页 | 9.90 MB | 6 月前3
The Shapes of Multidimensional Arraysshape[distributed(4)]()[3][5]> std::ndarrayvectorized(5)]> std::ndarray vectorized (5)]> Operations on shapes std::ndarray 0 码力 | 62 页 | 1.38 MB | 6 月前3
Building bridges: Leveraging C++ and ROS for simulators, sensor data and algorithmsthis talk, our primary discussion will be towards achieving data determinism via: • Deterministic execution2 will always run computations in the same order. • Deterministic communication2 is when, for a message runtime • All operations are finite and bounded • All potentially blocking calls have timeouts • Execution is deterministic and monitored • Memory usage • Allocations during runtime • STL constructs with std::exception • Blocking calls, such as fprintf, fwrite • Non-deterministic execution • Controlling memory usage • Controlling task execution Deterministic resource usage and runtime is necessary for a safety-critical0 码力 | 38 页 | 2.17 MB | 6 月前3
Just-in-Time Compilation - J F Bastien - CppCon 2020JiT compilation includes any translation performed dynamically, after a program has started execution. We examine the motivation behind JiT compilation and constraints imposed on JiT compilation systems JiT compilation includes any translation performed dynamically, after a program has started execution. We examine the motivation behind JiT compilation and constraints imposed on JiT compilation systems JiT compilation includes any translation performed dynamically, after a program has started execution. We examine the motivation behind JiT compilation and constraints imposed on JiT compilation systems0 码力 | 111 页 | 3.98 MB | 6 月前3
共 180 条
- 1
- 2
- 3
- 4
- 5
- 6
- 18
相关搜索词
RemoteExecutionCachingCompilerRECCsimdHowtoExpressInherentParallelismEfficientlyViaDataParallelTypesPerformanceEngineeringBeingFriendlyYourHardwareC++20AnAlmostCompleteOverviewCodeGenerationfromUnifiedRobotDescriptionFormatforAcceleratedRoboticsVectorizingCFDWithstdSupplementedbyTransparentLoadingandStoringIsAllYouNeedFusionTheShapesofMultidimensionalArraysBuildingbridgesLeveragingROSsimulatorssensordataalgorithmsJustinTimeCompilationBastienCppCon2020













