LLM inference - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

夏歌-使用Rust构建LLM应用

## RUST CHINA CONF 2023 第三届中国 Rust 开发者大会 6.17-6.18 @Shanghai ## 使用 Rust 构建 LLM 应用夏歌 ## 😍 ![Image](/uploads/documents/b/4/b/2/b4b20fa17af007f415a446d15b2b9803/p3_1.jpg) ## Bojan Tunguz ![Image 培养更广泛的 Rust 开发围绕 LLM 生态封装相应的 Rust 框架，让开发者能够使用简单的 Rust 写应用如何用 Rust 实现的构建和部署 AI 相关工作流的 serverless 平台 - 上传 Rust function，平台负责将 Rust 编译成 Wasm，并运行在 WasmEdge 安全容器中 - 平台封装了一些常用 LLM 和 SaaS 的 API，并发布成了 crate，比如

0 码力 | 36 页 | 38.31 MB | 2 年前
3
开源中国 2023 大模型(LLM)技术报告

## COSCHINA gitee ## 2023 中国开源开发者报告 China Open Source 2023 Annual Report ## LLM 技术报告出品：OSCHINA & Gitee 编委会：雨多田光，OSCHINA总编局长，OSCHINA主编王茜，OSCHINA主编叶子，OSCHINA新媒体运营鱼仔，OSCHINA新媒体运营诺墨，Gitee开源社区产品负责人设计：张琪 ## LLM 技术报告大语言模型（LLM）技术作为人工智能领域的一项重要创新在今年引起了广泛的关注。 LLM 是利用深度学习和大数据训练的人工智能系统，专门设计来理解、生成和回应自然语言。这些模型通过分析大量的文本数据来学习语言的结构和用法，从而能够执行各种语言相关任务。以 GPT 系列为代表，LLM 以其在自然语言处理领域的卓越表现，成为推动语言理解、生成和应用的引擎。 LLM 在多个在多个领域都取得了令人瞩目的成就。在自然语言处理领域，GPT 系列模型在文本生成、问答系统和对话生成等任务中展现出色的性能。在知识图谱构建、智能助手开发等方面，LLM 技术也发挥了关键作用。此外，它还在代码生成、文本摘要、翻译等任务中展现了强大的通用性。本报告从技术人视角出发，将深入探讨 LLM 技术的背景、基础设施、应用现状，以及相关的工具和平台。 ![Image](/uploads/documents/f/4/8/5

0 码力 | 32 页 | 13.09 MB | 2 年前
3
vLLM v0.4.0.post1 Documentation

Documentation 3 2 Indices and tables 59 Python Module Index 61 Index 63 ## LLM vLLM is a fast and easy-to-use library for LLM inference and serving. vLLM is fast with: - State-of-the-art serving throughput including parallel sampling, beam search, and more - Tensor parallelism support for distributed inference - Streaming outputs - OpenAI-compatible API server - Support NVIDIA GPUs and AMD GPUs - (Experimental) PagedAttention) - vLLM paper (SOSP 2023) - How continuous batching enables 23x throughput in LLM inference while reducing p50 latency by Cade Daniel et al. ## DOCUMENTATION ## 1.1 Installation vLLM is

0 码力 | 68 页 | 810.15 KB | 5 月前
3
vLLM v0.5.2 Documentation

Documentation 3 2 Indices and tables 157 Python Module Index 159 Index 161 ## LLM vLLM is a fast and easy-to-use library for LLM inference and serving. vLLM is fast with: - State-of-the-art serving throughput sampling, beam search, and more - Tensor parallelism and pipeline parallelism support for distributed inference - Streaming outputs - OpenAI-compatible API server - Support NVIDIA GPUs and AMD GPUs - (Experimental) PagedAttention) - vLLM paper (SOSP 2023) - How continuous batching enables 23x throughput in LLM inference while reducing p50 latency by Cade Daniel et al. - vLLM Meetups. ## DOCUMENTATION ## 1.1 Installation

0 码力 | 166 页 | 1.15 MB | 5 月前
3
OpenAI 《A practical guide to building agents》

multi-step tasks. Advances in reasoning, multimodality, and tool use have unlocked a new category of LLM-powered systems known as agents. This guide is designed for product and engineering teams exploring characteristics that allow it to act reliably and consistently on behalf of a user: 01 It leverages an LLM to manage workflow execution and make decisions. It recognizes when a workflow is complete and can rules engine works like a checklist, flagging transactions based on preset criteria. In contrast, an LLM agent functions more like a seasoned investigator, evaluating context, considering subtle patterns

0 码力 | 34 页 | 7.00 MB | 1 年前
3
vLLM v0.5.3 Documentation

Documentation 3 2 Indices and tables 135 Python Module Index 137 Index 139 ## LLM vLLM is a fast and easy-to-use library for LLM inference and serving. vLLM is fast with: - State-of-the-art serving throughput sampling, beam search, and more - Tensor parallelism and pipeline parallelism support for distributed inference - Streaming outputs - OpenAI-compatible API server - Support NVIDIA GPUs and AMD GPUs - (Experimental) PagedAttention) - vLLM paper (SOSP 2023) - How continuous batching enables 23x throughput in LLM inference while reducing p50 latency by Cade Daniel et al. - vLLM Meetups. ## DOCUMENTATION ## 1.1 Installation

0 码力 | 143 页 | 1.07 MB | 5 月前
3
vLLM v0.6.1.post2 Documentation

Documentation 3 2 Indices and tables 205 Python Module Index 207 Index 209 ## LLM vLLM is a fast and easy-to-use library for LLM inference and serving. vLLM is fast with: - State-of-the-art serving throughput sampling, beam search, and more - Tensor parallelism and pipeline parallelism support for distributed inference - Streaming outputs - OpenAI-compatible API server - Support NVIDIA GPUs, AMD CPUs and GPUs, PagedAttention) - vLLM paper (SOSP 2023) - How continuous batching enables 23x throughput in LLM inference while reducing p50 latency by Cade Daniel et al. - vLLM Meetups. ## DOCUMENTATION ## 1.1 Installation

0 码力 | 215 页 | 1.29 MB | 5 月前
3
vLLM v0.6.2 Documentation

Documentation 3 2 Indices and tables 217 Python Module Index 219 Index 221 ## LLM vLLM is a fast and easy-to-use library for LLM inference and serving. vLLM is fast with: - State-of-the-art serving throughput sampling, beam search, and more - Tensor parallelism and pipeline parallelism support for distributed inference - Streaming outputs - OpenAI-compatible API server - Support NVIDIA GPUs, AMD CPUs and GPUs, PagedAttention) - vLLM paper (SOSP 2023) - How continuous batching enables 23x throughput in LLM inference while reducing p50 latency by Cade Daniel et al. - vLLM Meetups. ## DOCUMENTATION ## 1.1 Installation

0 码力 | 227 页 | 1.33 MB | 5 月前
3
vLLM v0.6.1.post1 Documentation

Documentation 3 2 Indices and tables 205 Python Module Index 207 Index 209 ## LLM vLLM is a fast and easy-to-use library for LLM inference and serving. vLLM is fast with: - State-of-the-art serving throughput sampling, beam search, and more - Tensor parallelism and pipeline parallelism support for distributed inference - Streaming outputs - OpenAI-compatible API server - Support NVIDIA GPUs, AMD CPUs and GPUs, PagedAttention) - vLLM paper (SOSP 2023) - How continuous batching enables 23x throughput in LLM inference while reducing p50 latency by Cade Daniel et al. - vLLM Meetups. ## DOCUMENTATION ## 1.1 Installation

0 码力 | 215 页 | 1.28 MB | 5 月前
3
vLLM v0.6.1 Documentation

Documentation 3 2 Indices and tables 205 Python Module Index 207 Index 209 ## LLM vLLM is a fast and easy-to-use library for LLM inference and serving. vLLM is fast with: - State-of-the-art serving throughput sampling, beam search, and more - Tensor parallelism and pipeline parallelism support for distributed inference - Streaming outputs - OpenAI-compatible API server - Support NVIDIA GPUs, AMD CPUs and GPUs, PagedAttention) - vLLM paper (SOSP 2023) - How continuous batching enables 23x throughput in LLM inference while reducing p50 latency by Cade Daniel et al. - vLLM Meetups. ## DOCUMENTATION ## 1.1 Installation

0 码力 | 215 页 | 1.29 MB | 5 月前
3

共 451 条前往

页

分类

语言

格式

夏歌-使用Rust构建LLM应用

开源中国 2023 大模型(LLM)技术报告

vLLM v0.4.0.post1 Documentation

vLLM v0.5.2 Documentation

OpenAI 《A practical guide to building agents》

vLLM v0.5.3 Documentation

vLLM v0.6.1.post2 Documentation

vLLM v0.6.2 Documentation

vLLM v0.6.1.post1 Documentation

vLLM v0.6.1 Documentation

搜索

分类

语言

格式