Trends Artificial Intelligence
Richard Hirsh; John McCallum; OpenAI Details on Page 138 0 Years 72 Years Electric Power Computer Memory AI Inference AI Monetization Threats = Rising Competition + Open-Source Momentum + China’s Rise – 2024 AI Development Trending = UnprecedentedAI Performance = Increasingly Realistic Audio Translation / Generation… 46 Note: China data may be subject to informational limitations due to government in that language. Each speaker’s original voice is isolated and cloned before generating the translation to make sure they sound the same in every language. - ElevenLabs Press Release, 1/24 Global0 码力 | 340 页 | 12.14 MB | 5 月前3
TVM: Where Are We GoingSpecialized Accelerators Tensor Compute Primitives Unified Buffer Acc FIFO Explicitly Managed Memory Subsystem TPUsTensorization Challenge Compute primitives scalar vector tensor Challenge: Build product readyInterpolate with Other Compilers MLIR-TF Function relay::Function TorchScript IR Translation Custom Packaging runtime::Module ExternModule DSOModule Function in Other IR ExternFunc0 码力 | 31 页 | 22.64 MB | 6 月前3
OpenAI 《A practical guide to building agents》10 11 12 13 14 15 16 17 18 19 20 21 22 23 from import "manager_agent" "You are a translation agent. You use the tools given to you to translate." "translate_to_spanish" "Translate the user's {message.content}") async def for in print "Translate 'hello' to Spanish, French and Italian for me!" Translation step: Declarative vs non-declarative graphs Some frameworks are declarative, requiring developers0 码力 | 34 页 | 7.00 MB | 6 月前3
OpenAI - AI in the Enterpriseoffer more and better insights to clients. They started with three model evals: 01 Language translation Measuring the accuracy and quality of translations produced by a model. 02 Summarization Evaluating0 码力 | 25 页 | 9.48 MB | 6 月前3
Google 《Prompt Engineering v7》summarization, information extraction, question and answering, text classification, language or code translation, code generation, and code documentation or reasoning. Please feel free to refer to Google’s prompting0 码力 | 68 页 | 6.50 MB | 6 月前3
清华大学 DeepSeek+DeepResearch 让科研像聊天一样简单给出 输出结果,其中第一列是原文,第二列是翻译后的句子,每行只给出一个句子 所提供段落的语言是中文,以下是按要求的标记表格式翻译成英文的译文: Original (Chinese) Translation (English) 捕食是一个基本的生态过程,捕食的定义为:一种生物(捕食 者)捕食了另一种生物(猎物)(Begon等,1997)。 Predation is a fundamental0 码力 | 85 页 | 8.31 MB | 8 月前3
OctoML OSS 2019 11 8part of the systeml e Haichen and | will discuss more details at TVMConf. Oo oo QQ octoML 11 VM Memory Planning e Recently shipped a first version fn enain(0) -> Tensor[tk,),f32] { ofdynamicmemory Planmng Let t2 3 memory planning,, storage Let s = alLLoc_storage(40,64,f32) ; Tet outl = attoc_tensor(s,(19,),f32); coalescing, memory re-use for invoke_ l,t2),(outl,))3 Out1l loops, and offloading dynamic } allocation to devices. QQ octoML VM Memory Abstractions Old New t1: Tensor t1: Tensor0 码力 | 16 页 | 1.77 MB | 6 月前3
DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language Modelthe KV joint compression in MLA reduces the KV cache. Moreover, in order to reduce the activation memory during training, we also perform 7 low-rank compression for the queries, even if it cannot reduce relatively few activated parameters, and a portion of the operators are recomputed to save acti- vation memory, it can be trained without the necessity of tensor parallelism, thereby decreasing the communication demands on the training framework. It requires careful engineering optimization to manage the GPU memory and RAM pressure, and meanwhile maintain a fast training speed. For this goal, we implement the following0 码力 | 52 页 | 1.23 MB | 1 年前3
Deploy VTA on Intel FPGAVTA ON INTEL FPGA©2019 HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED 5 Software - CMA Contiguous Memory Allocation – Linux Kernel DEPLOY VTA ON INTEL FPGA https://pynq.readthedocs.io/en/v2.0/pynq_package/pynq 08.02_pr.tar.gz©2019 HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED 6 Software - CMA Contiguous Memory Allocation – Linux Kernel Module DEPLOY VTA ON INTEL FPGA Setup Environment Variables Navigate INTERNATIONAL INDUSTRIES, INCORPORATED 7 Software - Driver Cyclone V & Arria V SoC HPS Physical Memory Map DEPLOY VTA ON INTEL FPGA©2019 HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED 8 Hardware Configure0 码力 | 12 页 | 1.35 MB | 6 月前3
PAI & TVM Meetup - Shanghai 20191116TensorCore Intrinsics 。Authored by @Hzfengsy 。 Intrinsics: tvm_load_matrix_sync tvm_mma_sync … “New Memory Scopes: wmma.matrix_a/b, accumulator 。Tensorization on warp level schedule Motivation load/store for higher bandwidth utilization 。Double buffer to hide memory load latency 。 storage align to reduce bank conflicts of shared memory 。 Virtual threads for data reuse (on going) Performance on V1000 码力 | 26 页 | 5.82 MB | 6 月前3
共 12 条
- 1
- 2













