Performance Engineering: Being Friendly to Your Hardware13  ## Example - memcpy • Problem space • Performance requirements • Scalar, various vectors, specialty instructions, • Alignment: source and destination • Size • Direction • Linearity ## Example – memcpy: scalar naive void *memcpy_scalar(char *dst, const char *src, size_t n) { if(n) { while (n--) { no vectorization 0000000000001270 <_z13memcpy_scalarPcPKcm>: 1270: 48 89 f8 mov rax, rdi 1273: 48 85 d2 test rdx, rdx 1276: 74 1b je 1293 <_z13memcpy_scalarPcPKcm+0x23> 1278: 31 c9 xor ecx0 码力 | 111 页 | 2.23 MB | 1 年前3
When Nanoseconds Matter: Ultrafast Trading Systems in C++mQ->mWriteCounter.store(mLocalCounter, std::memory_order_release); std::memcpy(mNextElement, &size, sizeof(int32_t)); std::memcpy(mNextElement + sizeof(int32_t), buffer.data(), buffer.size()); mQ->mReadCounter.load(std::memory_order_acquire)) return 0; int32_t size; std::memcpy(&size, mNextElement, sizeof(int32_t)); int32_t writeCounter = mQ->mWriteCounter.load overflow"); EXPECT(size <= buffer.size(), "buffer space isn't large enough"); std::memcpy(buffer.data(), mNextElement + sizeof(size), size); const int32_t payloadSize = sizeof(size)0 码力 | 123 页 | 5.89 MB | 1 年前3
POCOAS in C++: A Portable Abstraction for Distributed Data Structuresremote_ptr = ...; // Calculate data auto values = algorithm(1.0f, 3, data); // Send data to proc.1 BCL::memcpy(remote_ptr, values.data(), values.size() * sizeof(float)); BCL::flush(); // Data is copied. ## How t=""> struct GlobalPtr { private: size_t rank_; size_t offset_; }; void memcpy(void* dest, GlobalPtrsrc, size_t n) { // Issue remote backend::remote_get(dest, src, n, ...); } }; ## Remote Pointer Types - Can build memcpy to support reading/writing from/to remote memory - Can write fetch_and_op, compare_and_swap, etc0 码力 | 128 页 | 2.03 MB | 1 年前3
Distributed Ranges: A Model for Building Distributed Data Structures, Algorithms, and Viewsremote_ptr = ...; // Calculate data auto values = algorithm(1.0f, 3, data); // Send data to proc.1 memcpy(remote_ptr, values.data(), values.size() * sizeof(float)); flush(); // Data is copied t=""> struct remote_ptr { private: size_t rank_; size_t offset_; }; void memcpy(void* dest, remote_ptrsrc, size_t count) { // Issue remote get operation to T& operator=(const T& value) { memcpy(ptr_, &value, sizeof(T)); return value; } operator T() { T value; memcpy(&value, ptr_, sizeof(T)); return 0 码力 | 127 页 | 2.06 MB | 1 年前3
Hidden Overhead of a Function APIare trivial only on MSVC. This is not a problem for the function calls. But a problem for std::memcpy and std::bit_cast. • std::tuple is never trivially move constructible. ## Can we do something about #includevoid raw_copy(std::byte* dst, std::byte const* src, size_t size) { std::memcpy(dst, src, size); } void checked_copy(// imagine 2 std::spans here std::byte* dst, std::byte const* std::memcpy(dst, src, dst_size); |armv8-a clang 18.1.0|x86-64 gcc 14.2|x64 msvc v19.40 VS17.10|139| |---|---|---|---| |b memcpy|jmp memcpy|jmp memcpy|RAW| |b memcpy|jmp memcpy|jmp0 码力 | 158 页 | 2.46 MB | 1 年前3
Tracy: A Profiler You Don't Want to Missusually a good thing! implicit ones too: e.g. future.get() please, just don't... TracyLockable memcpy(), malloc() et al. shader compilation sorting, searching, container manipulations, ... file, network implicit ones too: e.g. future.get() better yet, avoid sleep() in production code! TracyLockable memcpy(), malloc() et al. shader compilation sorting, searching, container manipulations, ... file, network don't call memcpy() directly... call some my_memcpy() which then calls memcpy() void* my_memcpy(void* dst, const void* src, size_t size) ZoneScopedC(tracy::Color::Blue); return ::memcpy(dst, src, size);0 码力 | 84 页 | 8.70 MB | 1 年前3
Tracy: A Profiler You Don't Want to Missusually a good thing! implicit ones too: e.g. future.get() please, just don't... TracyLockable memcpy(), malloc() et al. shader compilation sorting, searching, container manipulations, ... file, network implicit ones too: e.g. future.get() better yet, avoid sleep() in production code! TracyLockable memcpy(), malloc() et al. shader compilation sorting, searching, container manipulations, ... file, network don't call memcpy() directly... call some my_memcpy() which then calls memcpy() void* my_memcpy(void* dst, const void* src, size_t size) ZoneScopedC(tracy::Color::Blue); return ::memcpy(dst, src, size);0 码力 | 85 页 | 6.51 MB | 1 年前3
C++高性能并行编程与优化 - 课件 - 02 现代 C++ 入门:RAII 内存管理&other) { m_size = other.m_size; m_data = (int *)malloc(m_size * sizeof(int)); memcpy(m_data, other.m_data, m_size * sizeof(int)); }  { m_size = other.m_size; m_data = (int *)malloc(m_size * sizeof(int)); memcpy(m_data, other.m_data, m_size * sizeof(int)); } Vector &operator=(Vector const &other) &other) { m_size = other.m_size; m_data = (int *)realloc(m_data, m_size * sizeof(int)); memcpy(m_data, other.m_data, m_size * sizeof(int)); return *this; } ## C++11:为什么区分拷贝和移动? - 有时候,我们需要把一个对象0 码力 | 96 页 | 16.28 MB | 2 年前3
Back to Basics: Move Semantics> 0) { // if not empty data = new char[len+1]; // -new memory memcpy(data, s.data, // -copy chars len+1); } } }; }; C++ > 0) { // if not empty data = new char[len+1]; // -new memory memcpy(data, s.data, // -copy chars len+1); } } }; }; public: > 0) { // if not empty data = new char[len+1]; // -new memory memcpy(data, s.data, // -copy chars len+1); } } } };0 码力 | 23 页 | 1020.10 KB | 1 年前3
C++高性能并行编程与优化 - 课件 - 12 从计算机组成原理看 C 语言指针命名空间中的版本是带有多种重载的。 - 建议别用全局的任何函数(C 语言原始的),始终带上 std:: 前缀(C++ 改良后的)。 • C++ 甚至还有 std::printf, std::memcpy, std::size_t 虽然这些其实没有任何区别..... main.cpp #include#include int main() { float jpg) ## C 语言特色:字符串以 0 结尾 - 这就是为什么 strcpy(dst, src) 只需要两个指针做参数,而 memcpy(dst, src, n) 额外需要一个长度做参数。因为 strcpy 针对的是以 0 结尾的字符数组(C 语言特色),而 memcpy 需要面对的是任意类型的数组。 • strlen(s) 的本质无非是从指针 s 开始,第几个元素是 0,也就是字符串的长度了。 0 码力 | 128 页 | 2.95 MB | 2 年前3
共 154 条
- 1
- 2
- 3
- 4
- 5
- 6
- 16
相关搜索词
Performance EngineeringHardwareMemcpyAlignmentPerformance TestingUltra low-latency executionC++Optiver低延迟系统量化交易分布式数据结构远程指针类型数据分布性能抽象实现分布式范围分段处理分布式算法并行计算函数APIstd::function性能优化C++23SnapTracy Profiler性能分析工具实时分析内存使用限制条件Tracy性能分析跨平台无开销RAII智能指针内存管理构造函数解构函数Move Semanticsrvalue referencesperfect forwardingspecial member functionsstd::moveC语言指针引用空指针二级指针













