动手学深度学习 v2.0gutenberg.org/ebooks/35 8.2. 文本预处理 299 (continued from previous page) tokens = tokenize(lines) for i in range(11): print(tokens[i]) ['the', 'time', 'machine', 'by', 'h', 'g', 'wells'] [] [] [] [] __init__(self, tokens=None, min_freq=0, reserved_tokens=None): if tokens is None: tokens = [] if reserved_tokens is None: reserved_tokens = [] # 按出现频率排序 counter = count_corpus(tokens) self._token_freqs items(), key=lambda x: x[1], reverse=True) # 未知词元的索引为0 self.idx_to_token = [''] + reserved_tokens (continues on next page) 300 8. 循环神经网络 (continued from previous page) self.token_to_idx = {token: 0 码力 | 797 页 | 29.45 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architecturesword2vec family of algorithms6 (apart from others like GloVe7) which can learn embeddings for word tokens for NLP tasks. The embedding table generation process is done without having any ground-truth labels We would learn embeddings of dimensions each (where we can also view 10 We are dealing with word tokens as an example here, hence you would see the mention of words and their embeddings. In practice, we pairs of input context (neighboring words), and the label (masked word to be predicted). The word tokens are vectorized by replacing the actual words by their indices in our vocabulary. If a word doesn’t0 码力 | 53 页 | 3.92 MB | 1 年前3
AI大模型千问 qwen 中文文档decode() to get the output. # Use `max_new_tokens` to control the maximum output length. generated_ids = model.generate( model_inputs.input_ids, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] zip(model_inputs.input_ �→ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] 以前,我们使用 model.chat() (有关更多详细信息,请参阅先前 Qwen 模型中的 modeling_qwen. py )。现在,我们遵循 transformers streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True) generated_ids = model.generate( model_inputs.input_ids, max_new_tokens=512, streamer=streamer, ) 1.2.2 使用 vLLM 部署 要部署 Qwen1.5,我们建议您使用0 码力 | 56 页 | 835.78 KB | 1 年前3
PyTorch Release Notesthat were introduced in Transformer-XL help capture better long-term dependencies by attending to tokens from multiple previous segments. Our implementation is based on the codebase that was published by that were introduced in Transformer-XL help capture better long-term dependencies by attending to tokens from multiple previous segments. Our implementation is based on the codebase that was published by that were introduced in Transformer-XL help capture better long-term dependencies by attending to tokens from multiple previous segments. Our implementation is based on the codebase that was published by0 码力 | 365 页 | 2.94 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 6 - Advanced Learning Techniques - Technical Reviewpretext task. This works well for domains like natural language where your data will be a sequence of tokens. You can extend the analogy to being a tensor of rank , and hide part of the input and train the For BERT, figure 6-3, the pretext tasks are as follows: 1. Masked Language Model (MLM): 15% of the tokens in the given sentence are masked and the model needs to predict the masked token. 2. Next Sentence GPT-3 is a transformer model that only has the decoder (input is a sequence of tokens, and the output is a sequence of tokens too). It excels in natural language generation and hence has been 8 BERT model0 码力 | 31 页 | 4.03 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 1 - Introductionmathematical representation that our models can use. The quality of these models scales with the number of tokens we learn an embedding for (the size of our vocabulary), and the size of the embedding (known as the embedding table on the left with an embedding for each token. Hashing Trick on the right, where multiple tokens map to the same slot and share embeddings, and thus helps with saving space. To remedy this problem the model. With the Hashing Trick, instead of learning one embedding vector for each token, many tokens can share a single embedding vector. The sharing can be done by computing the hash of the token modulo0 码力 | 21 页 | 3.17 MB | 1 年前3
机器学习课程-温州大学-14深度学习-Vision Transformer (ViT) 31 5. 模型的代码实现 • 一个图片224x224,分成了49个32x32的patch; • 对这么多的patch做embedding,成49个128向量; • 再拼接一个cls_tokens,变成50个128向量; • 再加上pos_embedding,还是50个128向量; • 这些向量输入到transformer中进行自注意力的特征提取; • 输出的是50个128向量,然0 码力 | 34 页 | 2.78 MB | 1 年前3
【PyTorch深度学习-龙龙老师】-测试版202112build_vocab(train_data) # 打印单词数量:10000++ print(f'Unique tokens in TEXT vocabulary: {len(TEXT.vocab)}') # 打印标签数量:pos+neg print(f'Unique tokens in LABEL vocabulary: {len(LABEL.vocab)}') Out [46]: # 'you', "'ve", 'seen', 'this', 'movie', … '.'] example label: pos Unique tokens in TEXT vocabulary: 10002 Unique tokens in LABEL vocabulary: 2 可以看到训练集和测试集的长度都为 25000,即 25000 条句子数量,分词后的单词使用数 字编码 0 码力 | 439 页 | 29.91 MB | 1 年前3
共 8 条
- 1













