《Efficient Deep Learning Book》[EDL] Chapter 3 - Learning Techniquesnumber of labeled training examples. Data Augmentation is a set of techniques which leverage the original training data to generate more training examples without having to label them. We’ll familiarize labeling and training costs and call it a day! 2. We want the highest possible accuracy with the original training costs: We can let the model train with the new learning techniques. In many cases, this which when trained with the learning techniques, meets the quality threshold (80% accuracy) with the original training budget. In our example, suppose that a trimmed model is 150KB in size and achieves an0 码力 | 56 页 | 18.93 MB | 1 年前3
PyTorch Release Notesavailable on NGC. Contents of the PyTorch container This container image contains the complete source of the version of PyTorch in /opt/ pytorch. It is prebuilt and installed in the default Python environment Conda-specific packages, which might not be available on PyPI, we recommend building these packages from source. A workaround is to manually install a Conda package manager, and add the conda path to your PYTHONPATH is available on GitHub and NGC. ‣ ResNet50 v1.5 model: This model is a modified version of the original ResNet50 v1 model. This model script is available on GitHub and NGC. ‣ GNMT v2 model: This model0 码力 | 365 页 | 2.94 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 7 - Automationfunction of resources allocated to each configuration. Promising configurations get more resources. Source: Hyperband2 2 Li, Lisha, et al. "Hyperband: A novel bandit-based approach to hyperparameter optimization demonstration of configuration and resource allocation changes across multiple brackets in a Hyperband. Source: Hyperband In chapter 3, we trained a model to classify flowers in the oxford_flowers102 dataset predicts a parameter of the convolution layer. The controller predicts these parameters layer by layer. Source: Neural Architecture Search with Reinforcement Learning. The generated child networks performed0 码力 | 33 页 | 2.48 MB | 1 年前3
keras tutorialKeras i Keras ii About the Tutorial Keras is an open source deep learning framework for python. It has been developed by an artificial intelligence researcher Theano, etc., for creating deep learning models. Overview of Keras Keras runs on top of open source machine libraries like TensorFlow, Theano or Cognitive Toolkit (CNTK). Theano is a python library named “kerasvenv”. Move to the folder and type the below command, $ cd kerasvenv kerasvenv $ source bin/activate Windows Windows users move inside the “kerasenv” folder and type the below command0 码力 | 98 页 | 1.57 MB | 1 年前3
Lecture Notes on Support Vector Machineunconstrained (as supposed to the original constrained minimization problem); ii) G is an infimum of a set of affine functions and thus is a concave function regardless of the original problem; iii) G can be −∞ Theorem 1. Lower Bounds Property: If α ⪰ 0, then G(α, β ) ≤ p∗ where p∗ is the optimal value of the (original) primal problem defined by (9)∼(11). Proof. If �ω is feasible, then we have gi(�ω) ≤ 0 for ∀i = non-trivial lower bounds. The duality is said to be strong if d∗ = p∗. In this case, we can optimize the original problem by optimizing its dual problem. 2.2.2 Complementary Slackness Let ω∗ be a primal optimal0 码力 | 18 页 | 509.37 KB | 1 年前3
Lecture 6: Support Vector Machinex1xn, · · · , xn−1xn} It is an example of a quadratic mapping Each new feature uses a pair of the original features Feng Li (SDU) SVM December 28, 2021 45 / 82 Feature Mapping (Contd.) Problem: Mapping separator in the kernel defined feature space F This corresponds to a non-linear separator in the original space X Feng Li (SDU) SVM December 28, 2021 55 / 82 Kernelized SVM Prediction Define the decision x(i), x(j) > s.t. 0 ≤ αi ≤ C, ∀i = 1, · · · , m m � i=1 αiy(i) = 0 Use existing QP solvers to address the above optimization problem Feng Li (SDU) SVM December 28, 2021 62 / 82 Soft-Margin SVM (Contd0 码力 | 82 页 | 773.97 KB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 6 - Advanced Learning Techniques - Technical Reviewloss function, and train the model with the labeled data for the task at hand. We can keep the original model weights frozen, or let them be trainable. Such models are referred to as pre-trained models task by simply masking the input as discussed, and the output is the part that we masked out or the original input. Once we have pre-trained our model on one or a combination of pretext tasks, the prediction two sentences and , predict if follows . Figure 6-3: Pre-training and Fine-tuning steps for BERT. Source: Develin et al. For BERT, the pre-training loss is the mean of the losses for the above two tasks0 码力 | 31 页 | 4.03 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression Techniquesmapping is transmitted along with the encoded data. Figure 2-1: Huffman Encoding & Huffman Tree. Source When decoding the encoded data, we look up the code from the lookup table to retrieve the symbols prefix of some other code, which eliminates ambiguity when decoding), we can easily construct the original sequence of symbols from the encoded sequence and the lookup table. Refer the wikipedia article (people who like diced apples) where we don’t expect to recover the exact representation of the original data. It is okay to recover an approximation, however we do expect a better compression ratio than0 码力 | 33 页 | 1.96 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 1 - IntroductionProceedings, 2011. Figure 1-2: Growth of parameters in Computer Vision and NLP models over time. (Data Source) We have seen a similar effect in the world of Natural Language Processing (NLP) (see Figure 1-2) the number of mobile and IoT devices over time. The lighter blue bars represent forecasts. (Data Source: 1, 2) In this book, we will primarily focus on efficiency for both training and deploying efficient an apple, when using soft labels. Hard labels would penalize both mistakes the same way. In the original paper which proposed distillation, Hinton et al. replicated performance of an ensemble of 10 models0 码力 | 21 页 | 3.17 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architecturesembeddings table, which can be reused in a new task downstream. The new task could be unrelated to the original training13, similar to the toy example where we predicted if an animal was suitable for the petting general task. Although transfer learning requires the model to have the exact same architecture as the original task, embeddings are agnostic to the model architecture of the downstream task. In essence, the Once we have initialized the layer, we can invoke the adapt() method with the dataset to use as a source for building the vocabulary. # This step allows the vectorization layer to build the vocabulary0 码力 | 53 页 | 3.92 MB | 1 年前3
共 25 条
- 1
- 2
- 3













