PyTorch Release Notesmixed-precision arithmetic and Tensor Cores on V100 GPUs for faster training times while maintaining target accuracy. This model script is available on GitHub and NGC. ‣ Mask R-CNN model: Mask R-CNN is a arithmetic by using Tensor Cores on NVIDIA V100 GPUs for 1.3x faster training time while maintaining target accuracy. This model script is available on GitHub and NGC. ‣ Tacotron 2 and WaveGlow v1.1 model: mixed-precision arithmetic and Tensor Cores on V100 GPUs for faster training times while maintaining target accuracy. This model script is available on GitHub and NGC. ‣ Mask R-CNN model: Mask R-CNN is a0 码力 | 365 页 | 2.94 MB | 1 年前3
AI大模型千问 qwen 中文文档Thread generation_kwargs = dict(model_inputs, streamer=streamer, max_new_tokens=512) thread = Thread(target=model.generate, kwargs=generation_kwargs) thread.start() generated_text = "" for new_text in streamer: @dataclass class LoraArguments: lora_r: int = 64 lora_alpha: int = 16 lora_dropout: float = 0.05 lora_target_modules: List[str] = field( default_factory=lambda: [ "q_proj", "k_proj", "v_proj", "o_proj", "up_proj" lora_alpha: the alpha value for LoRA; • lora_dropout: the dropout rate for LoRA; • lora_target_modules: the target modules for LoRA. By default we tune all linear layers; • lora_weight_path: the path to0 码力 | 56 页 | 835.78 KB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 7 - Automationand creating a random classification dataset with 20 samples, each one assigned to one of the five target classes. import random import tensorflow as tf import numpy as np from tensorflow.keras import layers read_config=tfds.ReadConfig(try_autocache=False) ) Let's resize the dataset splits to the same size. The target size is identical to the project in chapter 3. # Dataset image size IMG_SIZE = 264 def resize_image(image model and the child networks are the players whose rewards are determined by their performance on the target dataset. The controller model learns to generate better architectures as the search game progresses0 码力 | 33 页 | 2.48 MB | 1 年前3
全连接神经网络实战. pytorch 版datasets 是 torchvision 的对象,它返回的数据就是 pytorch 的 Dataset 类型的。 参数 transf orm 表示导出的数据应该怎么转换,我们还可以使用参数 target_transf orm 表 示导出的数据标签应该怎么转换。 注意显示时我们调用了 squeeze() 函数,这是因为原来的数据维度是 (1,28,28) 的三维数据, 使用.squeeze() ” , train=True , #用 来 训 练 的 数 据 download=True , #如 果 根 目 录 没 有 就 下 载 transform=ToTensor () , target_transform=Lambda( lambda y : torch . zeros (10 , dtype=torch . f l o a t ) . scatter_ (0 , torch 是对数据的转换,ToTensor() 函数将 PIL 图像或者 NumPy 的 ndarray 转换为 FloatTensor 类型的,并且把图像的每个像素值压缩到 [0.0,1.0] 之间。 target_transf orm 10 1.2. 导入样本数据 是标签的转换,分类中我们需要将标签表示为向量的形式,例如一共有三类,则表示为: [1 0 0] (1.2.1) [0 1 0] (10 码力 | 29 页 | 1.40 MB | 1 年前3
pytorch 入门笔记-03- 神经网络(output, target) 作为输入,计算一个值来估计网络的输出和目标值相差多少。 译者注:output 为网络的输出,target 为实际值 nn 包中有很多不同的损失函数。 nn.MSELoss是一个比较简单的损失函数,它计算输出和目标间的均方误差, 例如: output = net(input) target = torch.rand(10) target = target.view(1 view(1, -1) criterion = nn.MSELoss() loss = criterion(output, target) print(loss) tensor(0.4526, grad_fn=) 现在,如果在反向过程中跟随 loss , 使用它的 .grad_fn 属性,将看到如下所示的计算图。 input -> conv2d -> relu -> parameters(), lr=0.01) # 迭代训练 optimizer.zero_grad() # 梯度清零 output = net(input) loss = criterion(output, target) # 计算损失 loss.backward() # 反向传播 optimizer.step() # 更新参数 注意 观察如何使用 optimizer.zero_grad() 手动将梯度缓冲区设置为零。 0 码力 | 7 页 | 370.53 KB | 1 年前3
Lecture 1: OverviewChoose exactly what is to be learned, i.e. the target function Choose how to represent the target function Choose a learning algorithm to infer the target function from the ex- perience. Feng Li (SDU) useful target function. Checker boards labeled with the correct move, e.g. extracted from record of expert play Indirect experience: Given feedback which is not direct I/O pairs for a useful target function have some training cases for which its value is known The thing we want to predict is called the target or the response variable Usually, we need training data Feng Li (SDU) Overview September 6, 20230 码力 | 57 页 | 2.41 MB | 1 年前3
Keras: 基于 Python 的深度学习库compile compile(self, optimizer, loss, metrics=None, sample_weight_mode=None, weighted_metrics=None, target_tensors=None) 用于配置训练模型。 参数 • optimizer: 字符串(优化器名)或者优化器对象。详见 optimizers。 • loss: 字符串(目标函数名)或目标函数。详见 sample_weight 或 class_weight 评估和加权的度 量标准列表。 • target_tensors: 默认情况下,Keras 将为模型的目标创建一个占位符,在训练过程中将使用 目标数据。相反,如果你想使用自己的目标张量(反过来说,Keras 在训练期间不会载入 这些目标张量的外部 Numpy 数据),您可以通过 target_tensors 参数指定它们。它应该 是单个张量(对于单输出 Sequential optimizer, loss, metrics=None, loss_weights=None, sample_weight_mode=None, weighted_metrics=None, target_tensors=None) 用于配置训练模型。 参数 • optimizer: 字符串(优化器名)或者优化器对象。详见 optimizers。 • loss: 字符串(目标函数名)或目标函数。详见0 码力 | 257 页 | 1.19 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 3 - Learning Techniquesof handwritten digit that can potentially confuse the human labelers to choose a 1 or a 7 as the target label. Obtaining labels in many cases requires significant human involvement, and for that reason In our example, only the 300 KB vanilla model is acceptable for deployment (it meets 80% accuracy target). Whereas, among the models with the learning techniques, four models with the smallest being the a cat! The label mixing transformations generate samples based on differently labeled inputs. The target label is a composite of the inputs that were combined. A combination of a dog with a hamster image0 码力 | 56 页 | 18.93 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architecturesdemonstrates the Skipgram task. Figure 4-5: This figure depicts the sliding window of size 5, the hidden target word, model inputs, and the label for a given sample text in the CBOW task. 7 GloVe - https://nlp arXiv:1301.3781 (2013). Figure 4-6: This figure depicts the sliding window of size 5, the hidden target word, model inputs, and the label for a given sample text in the Skipgram task. Let’s get to solving self-attention. Encoder-decoder attention computes attention between the encoder output sequence and the target sequence. Self-attention is a special type of attention which operates over a single sequence to0 码力 | 53 页 | 3.92 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 6 - Advanced Learning Techniques - Technical Reviewdomain that transfer well across specific tasks in that domain. They can be adapted to solve the target task by: 1. Adding a new prediction head to the pre-trained model which can translate the general labeled examples otherwise). If we add a middle-step of pre-training using unlabeled data from the same target dataset, the authors report needing fewer labeled examples. Refer to figure 6-6 for a comparison data, ULMFit semi-supervised: pre-training with WikiText-103 as well as unlabeled data from the target dataset and fine-tuning with labeled data). Source: Howard et al. The pre-trained model can then0 码力 | 31 页 | 4.03 MB | 1 年前3
共 21 条
- 1
- 2
- 3













