Tokens - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

动手学深度学习 v2.0

gutenberg.org/ebooks/35 8.2. 文本预处理 299 (continued from previous page) tokens = tokenize(lines) for i in range(11): print(tokens[i]) ['the', 'time', 'machine', 'by', 'h', 'g', 'wells'] [] [] [] [] __init__(self, tokens=None, min_freq=0, reserved_tokens=None): if tokens is None: tokens = [] if reserved_tokens is None: reserved_tokens = [] # 按出现频率排序 counter = count_corpus(tokens) self._token_freqs items(), key=lambda x: x[1], reverse=True) # 未知词元的索引为0 self.idx_to_token = [''] + reserved_tokens (continues on next page) 300 8. 循环神经网络 (continued from previous page) self.token_to_idx = {token:

0 码力 | 797 页 | 29.45 MB | 1 年前
3
《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architectures

word2vec family of algorithms6 (apart from others like GloVe7) which can learn embeddings for word tokens for NLP tasks. The embedding table generation process is done without having any ground-truth labels We would learn embeddings of dimensions each (where we can also view 10 We are dealing with word tokens as an example here, hence you would see the mention of words and their embeddings. In practice, we pairs of input context (neighboring words), and the label (masked word to be predicted). The word tokens are vectorized by replacing the actual words by their indices in our vocabulary. If a word doesn’t

0 码力 | 53 页 | 3.92 MB | 1 年前
3
AI大模型千问 qwen 中文文档

decode() to get the output. # Use `max_new_tokens` to control the maximum output length. generated_ids = model.generate( model_inputs.input_ids, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] zip(model_inputs.input_ �→ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] 以前，我们使用 model.chat() （有关更多详细信息，请参阅先前 Qwen 模型中的 modeling_qwen. py ）。现在，我们遵循 transformers streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True) generated_ids = model.generate( model_inputs.input_ids, max_new_tokens=512, streamer=streamer, ) 1.2.2 使用 vLLM 部署要部署 Qwen1.5，我们建议您使用

0 码力 | 56 页 | 835.78 KB | 1 年前
3
PyTorch Release Notes

that were introduced in Transformer-XL help capture better long-term dependencies by attending to tokens from multiple previous segments. Our implementation is based on the codebase that was published by that were introduced in Transformer-XL help capture better long-term dependencies by attending to tokens from multiple previous segments. Our implementation is based on the codebase that was published by that were introduced in Transformer-XL help capture better long-term dependencies by attending to tokens from multiple previous segments. Our implementation is based on the codebase that was published by

0 码力 | 365 页 | 2.94 MB | 1 年前
3
《Efficient Deep Learning Book》[EDL] Chapter 6 - Advanced Learning Techniques - Technical Review

pretext task. This works well for domains like natural language where your data will be a sequence of tokens. You can extend the analogy to being a tensor of rank , and hide part of the input and train the For BERT, figure 6-3, the pretext tasks are as follows: 1. Masked Language Model (MLM): 15% of the tokens in the given sentence are masked and the model needs to predict the masked token. 2. Next Sentence GPT-3 is a transformer model that only has the decoder (input is a sequence of tokens, and the output is a sequence of tokens too). It excels in natural language generation and hence has been 8 BERT model

0 码力 | 31 页 | 4.03 MB | 1 年前
3
《Efficient Deep Learning Book》[EDL] Chapter 1 - Introduction

mathematical representation that our models can use. The quality of these models scales with the number of tokens we learn an embedding for (the size of our vocabulary), and the size of the embedding (known as the embedding table on the left with an embedding for each token. Hashing Trick on the right, where multiple tokens map to the same slot and share embeddings, and thus helps with saving space. To remedy this problem the model. With the Hashing Trick, instead of learning one embedding vector for each token, many tokens can share a single embedding vector. The sharing can be done by computing the hash of the token modulo

0 码力 | 21 页 | 3.17 MB | 1 年前
3
机器学习课程-温州大学-14深度学习-Vision Transformer (ViT)

31 5. 模型的代码实现 • 一个图片224x224，分成了49个32x32的patch； • 对这么多的patch做embedding，成49个128向量； • 再拼接一个cls_tokens，变成50个128向量； • 再加上pos_embedding，还是50个128向量； • 这些向量输入到transformer中进行自注意力的特征提取； • 输出的是50个128向量，然

0 码力 | 34 页 | 2.78 MB | 1 年前
3
【PyTorch深度学习-龙龙老师】-测试版202112

build_vocab(train_data) # 打印单词数量：10000++ print(f'Unique tokens in TEXT vocabulary: {len(TEXT.vocab)}') # 打印标签数量：pos+neg print(f'Unique tokens in LABEL vocabulary: {len(LABEL.vocab)}') Out [46]: # 'you', "'ve", 'seen', 'this', 'movie', … '.'] example label: pos Unique tokens in TEXT vocabulary: 10002 Unique tokens in LABEL vocabulary: 2 可以看到训练集和测试集的长度都为 25000，即 25000 条句子数量，分词后的单词使用数字编码

0 码力 | 439 页 | 29.91 MB | 1 年前
3

共 8 条前往

页

分类

语言

格式

动手学深度学习 v2.0

《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architectures

AI大模型千问 qwen 中文文档

PyTorch Release Notes

《Efficient Deep Learning Book》[EDL] Chapter 6 - Advanced Learning Techniques - Technical Review

《Efficient Deep Learning Book》[EDL] Chapter 1 - Introduction

机器学习课程-温州大学-14深度学习-Vision Transformer (ViT)

【PyTorch深度学习-龙龙老师】-测试版202112