DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language Modeldisambiguation datasets include WinoGrande Sakaguchi et al. (2019) and CLUEWSC (Xu et al., 2020). Language modeling datasets include Pile (Gao et al., 2020). Chinese understanding and culture datasets include CHID HumanEval, MBPP, CRUXEval, BBH, AGIEval, CLUEWSC, CMRC, and CMath. In addition, we perform language- modeling-based evaluation for Pile-test and use Bits-Per-Byte (BPB) as the metric to guarantee fair comparison Phang, H. He, A. Thite, N. Nabeshima, et al. The Pile: An 800GB dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020. Google. Introducing gemini: our largest and most capable0 码力 | 52 页 | 1.23 MB | 1 年前3
Google 《Prompt Engineering v7》such as interacting with external APIs to retrieve information which is a first step towards agent modeling. ReAct mimics how humans operate in the real world, as we reason verbally and can take actions0 码力 | 68 页 | 6.50 MB | 6 月前3
Trends Artificial Intelligence
model serving, compute management, vector search & databases. Model development = frameworks for modeling & training, inference optimization, dataset engineering, & model evaluation. Application development0 码力 | 340 页 | 12.14 MB | 5 月前3
共 3 条
- 1













