Khronos APIs for Heterogeneous Compute and Safety: SYCL and SYCL SCGPUs Any CPU SYCL enables Khronos to influence ISO C++ to (eventually) support heterogeneous compute SYCL, OpenCL and SPIR-V, as open industry standards, enable flexible integration and deployment Experimental SYCL enables Khronos to influence ISO C++ to (eventually) support heterogeneous compute SYCL, OpenCL and SPIR-V, as open industry standards, enable flexible integration and deployment ecosystem for all accelerators ▪ Unify the heterogeneous compute ecosystem around open standards ▪ Focus Areas: AI, HPC, Edge AI, Edge Compute ▪ Open source collaborationROLE OF THE FOUNDATION • Host0 码力 | 82 页 | 3.35 MB | 6 月前3
Working with Asynchrony Generically: A Tour of C++ Executorsspawned work4 P2300: STD::EXECUTION Proposes: • A set of concepts that represent: • A handle to a compute resource (aka, scheduler) • A unit of lazy async work (aka, sender) • A completion handler (aka Launching concurrent work6 EXAMPLE: LAUNCHING CONCURRENT WORK namespace ex = std::execution; int compute_intensive(int); int main() { unifex::static_thread_pool pool{8}; ex::scheduler auto sched = ex::then(ex::schedule(sched), [] { return compute_intensive(0); }), ex::then(ex::schedule(sched), [] { return compute_intensive(1); }), ex::then(ex::schedule(sched), [] { return compute_intensive(2); }) );0 码力 | 121 页 | 7.73 MB | 6 月前3
Data Is All You Need for Fusion%%r15; jb 3243431f;"\ "3243430:\n\t"\ COMPUTE_m24n4 "subq £24,%%r15; cmpq £24,%%r15; jnb 3243430b;"\ "3243431:\n\t"\ "cmpq £8,%%r15; jb 3243433f;"\ "3243432:\n\t"\ COMPUTE_m8n4 "subq £8,%%r15; cmpq £8,%%r15; %%r15; jnb 3243432b;"\ "3243433:\n\t"\ "cmpq £2,%%r15; jb 3243435f;"\ "3243434:\n\t"\ COMPUTE_m2n4 "subq £2,%%r15; cmpq £2,%%r15; jnb 3243434b;"\ "3243435:\n\t"\ "movq %%r14,%1;"\ :"+r"(a_ptr),"+r"(b_ptr) %%r15; jb 3243231f;"\ "3243230:\n\t"\ COMPUTE_m24n2 "subq £24,%%r15; cmpq £24,%%r15; jnb 3243230b;"\ "3243231:\n\t"\ "cmpq £8,%%r15; jb 3243233f;"\ "3243232:\n\t"\ COMPUTE_m8n2 "subq £8,%%r15; cmpq £8,%%r15;0 码力 | 151 页 | 9.90 MB | 6 月前3
Cache-Friendly Design in Robot Path Planninggraph to make global plans to get around the warehouse. 54Why not just pre-compute all paths? 55Why not just pre-compute all paths? 56What do we do? 57Shortest-path search algorithms 58Shortest-path neighbor v of u with prev[v] is UNDEFINED: dist_v ← dist_u + Graph.Edge(u, v) # compute distance to (v) Q.add_with_priority((u, v, dist_v)) # enqueue (v,u,dist) neighbor v of u with prev[v] is UNDEFINED: dist_v ← dist_u + Graph.Edge(u, v) # compute distance to (v) Q.add_with_priority((u, v, dist_v)) # enqueue (v,u,dist)0 码力 | 216 页 | 10.68 MB | 6 月前3
Contracts for C++Lippincott struct UnaryFunction { virtual Value compute(ArgList args) pre (args.size() == 1); }; struct BinaryFunction { virtual Value compute(ArgList args) pre (args.size() == 2); Lippincott struct UnaryFunction { virtual Value compute(ArgList args) pre (args.size() == 1); }; struct BinaryFunction { virtual Value compute(ArgList args) pre (args.size() == 2); }; struct VariadicFunction : UnaryFunction, BinaryFunction { Value compute(ArgList args) override /* no preconditions */; };139 Copyright (c) Timur Doumler | @timur_audio |0 码力 | 181 页 | 4.44 MB | 6 月前3
Back to Basics Unit Testing1: Testing Precision Completeness 40float compute_pi() { return std::acos(-1); } TEST_CASE("Compute Pi") { float correct_value = ??? CHECK( compute_pi() == correct_value ); } Test Correctly have a falsifiable test. float compute_pi() { return std::acos(-1); } TEST_CASE("Compute Pi") { float correct_value = what_I_got_by_running_it; CHECK( compute_pi() == correct_value ); } Test youtube.com/c/ContinuousDelivery float compute_pi() { return std::acos(-1); } TEST_CASE("Compute Pi") { float correct_value = std::acos(-1); CHECK( compute_pi() == correct_value ); } Don't do0 码力 | 109 页 | 4.13 MB | 6 月前3
Vectorizing a CFD Code With std::simd Supplemented by Transparent Loading and Storinglibrary 3 Olaf Krzikalla, DLR SP, 2024-09-17Background: Vectorization Nowadays, all your CPUs can compute four times faster One CPU instruction adds/multiplies/… multiple set of operands at once → Single in CODA Compute flows in a cube 7 Olaf Krzikalla, DLR SP, 2024-09-17 i j k for dir : (0, 3) for i : (0, size) for j : (0, size) for k : (0, size) for c : (i|j|k + 1, size) compute_flow(point( compute_flow(point(...), point(...)); Compute-bound → worthwhile vectorization target • cascading calls ending up partly in external libraries • creating, passing and returning data structs → explicit vectorizationBackground:0 码力 | 58 页 | 2.68 MB | 6 月前3
The Roles of Symmetry And Orthogonality In Designcomponent) 2 Implementation Hygiene • Unnecessarily complex processing • Example: Eager-compute or Lazy-compute that introduces stochastic time-shifting of computation and resource contention, or which std::smart_ptr<>, std::unique_ptr<>) • Time-shifted computation (i.e., lazy-compute, eager- compute) • Double-compute (e.g., in iterators, or when using std::range) • Synchronization of async calls std::smart_ptr<>, std::unique_ptr<>) • Time-shifted computation (i.e., lazy-compute, eager- compute) • Double-compute (e.g., in iterators, or when using std::range) • Synchronization of async calls0 码力 | 151 页 | 3.20 MB | 6 月前3
The Beman Project: Bringing Standard Libraries to the Next Leveleye_color() const; }; vectorpeople = /* ... */; // Compute eye colors of 'people’. vector eye_colors = people | views::transform(&Person::eye_color) eye_color() const; }; vector people = /* ... */; // Compute eye colors of 'people’. vector eye_colors = people | views::transform(&Person::eye_color) eye_color() const; }; vector people = /* ... */; // Compute eye colors of 'people’. vector eye_colors = people | views::transform(&Person::eye_color) 0 码力 | 53 页 | 7.38 MB | 6 月前3
Application of C++ in Computational Cancer Modeling
increment_type(weights); // 3 population(incre_type)++; if (decre_type != -1) population(decre_type)--; } 1. Compute the weights matrix. 2. Determine the arrival time by sampling an exponential distribution. 3. Normalize get the probabilities for possible events. Sample a random variable to decide which event happens.Compute the weights matrix 11 void TumorGenerator::evolve_step() { Eigen::ArrayXXd weights = transition_rates Policy call .get() to obtain population arrays.Parallel STL algorithm (average population) 15 // compute average population Eigen::ArrayXXd initial_array = Eigen::ArrayXXd::Zero(population_result[0].rows()0 码力 | 47 页 | 1.14 MB | 6 月前0.03
共 122 条
- 1
- 2
- 3
- 4
- 5
- 6
- 13
相关搜索词
KhronosAPIsforHeterogeneousComputeandSafetySYCLSCWorkingwithAsynchronyGenericallyTourofC++ExecutorsDataIsAllYouNeedFusionCacheFriendlyDesigninRobotPathPlanningContractsBacktoBasicsUnitTestingVectorizingCFDCodeWithstdsimdSupplementedbyTransparentLoadingStoringTheRolesSymmetryAndOrthogonalityInBemanProjectBringingStandardLibrariestheNextLevelApplicationComputationalCancerModeling













