DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language Model104K 116K 128K Context Length (#Tokens) 0 9 18 27 36 45 55 64 73 82 91 100 Document Depth Percent (%) Pressure Testing DeepSeek-V2 Base 128K Context via "Needle In A HayStack" 1 2 3 4 Lin, K. Zhu, Z. Ye, L. Chen, S. Zheng, L. Ceze, A. Krishnamurthy, T. Chen, and B. Kasikci. Atom: Low-bit quantization for efficient and accurate LLM serving. CoRR, abs/2310.19102, 2023. URL https://doi.org/100 码力 | 52 页 | 1.23 MB | 1 年前3
Trends Artificial Intelligence
Source: Epoch AI (4/25) AI Technology Compounding = Numbers Behind The Momentum Performance, 16-bit FLOP/s +150% / Year Enabled by 1.6x annual growth in chips per cluster and 1.6x annual growth platforms will push breadth, stitching together knowledge across functions; specialists will push depth, delivering AI that speaks the language of compliance, contracts, and customer intent. The question0 码力 | 340 页 | 12.14 MB | 5 月前3
清华大学 DeepSeek+DeepResearch 让科研像聊天一样简单stable cyclic performance, while providing high capacity and high voltage curves, has sparked in-depth research and discussion. As a promising candidate for anode materials, alloy-based anodes such as Among various options, alloy-based anodes, especially silicon (Si, 4200 mA h g-1), have sparked in-depth research and discussion. This is primarily due to their extremely high theoretical capacity, which0 码力 | 85 页 | 8.31 MB | 8 月前3
TVM@Alibaba AI Labs140 100 50 0 1 aTF Lite gbit aNCNN 8bit aQNNPACK 8bit aaMNN 8bit TVM Overflow-aware 四ACE Overflow-aware (Assembly) [和| Alibaba AL.Labs0 码力 | 12 页 | 1.94 MB | 6 月前3
Google 《Prompt Engineering v7》years between my partner and me and add those up. (20+(9-3)). Let’s help the model to think a little bit more like me. Prompt Engineering February 2025 31 Table 12 is an example of ‘zero-shot’ Chain of paths by branching out from different nodes in the tree. There’s a great notebook, which goes into a bit more detail showing The Tree of Thought (ToT) which is based on the paper ‘Large Language Model Guided Please refer to the notebook14 hosted in the GoogleCloudPlatform Github repository, which goes into a bit more detail showing the actual LLM inputs and outputs with a more elaborate example. Prompt Engineering0 码力 | 68 页 | 6.50 MB | 7 月前3
亿联TVM部署step1 on Windows to generate the .dll for deployment 3. For application on 32bits, no support of 32bit tensorflow , a workround from FrozenGene a. python/tvm/contrib/ndk.py options = options if options0 码力 | 6 页 | 1.96 MB | 6 月前3
Deepseek R1 本地部署完全手册- 存储: 5GB - 内存: 8GB (M1/M2/M3) - 存储: 5GB 简单⽂本⽣成、基础代 码补全 7B - RAM: 8-10GB - GPU: GTX 1680(4-bit量 化) - 存储: 8GB - 内存: 16GB(M2 Pro/M3) - 存储: 8GB 中等复杂度问答、代码 调试 14B - RAM: 24GB - GPU: RTX 3090(24GB0 码力 | 7 页 | 932.77 KB | 8 月前3
TVM Meetup: Quantizationrepresented with a scale and a zero point http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf 𝑟𝑒𝑎𝑙_𝑣𝑎𝑙𝑢𝑒 = 𝑠𝑐𝑎𝑙𝑒 ∗ (𝑞𝑢𝑎𝑛𝑡𝑖𝑧𝑒𝑑_𝑣𝑎𝑙𝑢𝑒 − �0 码力 | 19 页 | 489.50 KB | 6 月前3
共 8 条
- 1













