keras tutorialfeatures: Consistent, simple and extensible API. Minimal structure - easy to achieve the result without any frills. It supports multiple platforms and backends. It is user friendly framework floatx represent the default data type float32. You can also change it to float16 or float64 using set_floatx() method. backend denotes the current backend. Suppose, if the file is not created then another neuron to which it is connected. Each neuron processes a small information and then passes the result to another neuron and this process continues. This is the basic method used by our human brain to0 码力 | 98 页 | 1.57 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression Techniquessine wave is a low precision representation which takes integer values in the range [0, 5]. As a result, the quantized wave requires low transmission bandwidth. Figure 2-3: Quantization of sine waves the low precision domain, because we are losing precision when going to a b-bit integer and as a result values which were close in the high precision domain might end up being mapped to the same value values, with the starting and endpoint defined, along with a step value. This returns the following result. [-10. -7.5 -5. -2.5 0. 2.5 5. 7.5 10. ] Now let’s quantize x. # Quantize the0 码力 | 33 页 | 1.96 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 3 - Learning Techniquesachieves a higher accuracy with the same number of labeled training examples. Data Augmentation is a set of techniques which leverage the original training data to generate more training examples without workflows. We start with data augmentation in the next section. Data Augmentation Data Augmentation is a set of dataset manipulation techniques to improve sample and label efficiencies of deep learning models which translates Spanish to English. This model translates “Estoy muy bien” to “I am fine”. This result can be used to train our original English to Spanish translation model. Let’s dig deeper into each0 码力 | 56 页 | 18.93 MB | 1 年前3
Experiment 1: Linear Regressionparameter which we need to optimize and x is the (n + 1)- dimensional feature vector 1. Given a training set {x(i)}i=1,··· ,m, our goal is to find the optimal value of θ such that the objective function J(θ) In Matlab/Octave, you can load the training set using the commands x = load ( ’ ex1x . dat ’ ) ; y = load ( ’ ex1y . dat ’ ) ; This will be our training set for a supervised learning problem with n = 1, so x ∈ R2 ). If you’re using Mat- lab/Octave, run the following commands to plot your training set (and label the axes): figure % open a new f i g u r e window plot (x , y , ’ o ’ ) ; ylabel ( ’0 码力 | 7 页 | 428.11 KB | 1 年前3
【PyTorch深度学习-龙龙老师】-测试版202112预览版202112 第 3 章 分类问题 2 集共 70000 张图片。其中 60000 张图片作为训练集?train(Training Set),用来训练模型,剩 下 10000 张图片作为测试集?test(Test Set),用来预测或者测试,训练集和测试集共同组成 了整个 MNIST 数据集。 考虑到手写数字图片包含的信息比较简单,每张图片均被缩放到28 × 28的大小,同时 10 份,每份长度为 1 result = torch.split(x, split_size_or_sections=1, dim=0) len(result) # 返回的列表为 10 个张量的列表 Out[8]: 10 可以查看切割后的某个张量的形状,它应是某个班级的所有成绩册数据,shape 为[35,8], 例如: In [9]: result[0] # 查看第一个班级的成绩册张量 [10]: x = torch.randn([10,35,8]) # 自定义长度的切割,切割为 4 份,返回 4 个张量的列表 result result = torch.split(x, [4,2,2,2] , dim=0) len(result) Out[10]: 4 查看第一个张量的 shape,根据上述的切割方案,它应该包含了 4 个班级的成绩册,shape 预览版2021120 码力 | 439 页 | 29.91 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 5 - Advanced Compression Techniquesa certain, say p, percentage of the smallest absolute valued weights in each training epoch. The result of such a training process is p% weights with zero values. Sparse compressed models achieve higher Compute the number of elements to zero. num_elements_to_zero = int(w_1d.shape[0] * sparsity_rate) # Set the respective indices to zero. w_1d[w_1d_sorted_indices[:num_elements_to_zero]] = 0.0 w = np.reshape(w_1d In each pruning round, the algorithm computes the saliency scores for all the weights and resets (set to zero) the fraction of the weights with smallest saliency scores. Then, it proceeds to fine-tune0 码力 | 34 页 | 3.18 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architecturesmodels into smaller and efficient models capable of running on mobile and edge devices. We have also set up a couple of programming projects for a hands-on model optimization experience using these efficient because it controls the number of unique words for which we learn embeddings. A small value for would result in loss of information because most of the words would get mapped to the OOV token. However, if information of the words. The words are all averaged to compute , and we would have got the same result for any other permutation of the words in the context. Hence the name Bag of Words for this family0 码力 | 53 页 | 3.92 MB | 1 年前3
PyTorch Release Notesadding only three lines of Python to an existing FP32 (default) script. AMP will select an optimal set of operations to cast to FP16. FP16 operations require 2X reduced memory bandwidth (resulting in a adding only three lines of Python to an existing FP32 (default) script. AMP will select an optimal set of operations to cast to FP16. FP16 operations require 2X reduced memory bandwidth (resulting in a adding only three lines of Python to an existing FP32 (default) script. AMP will select an optimal set of operations to cast to FP16. FP16 operations require 2X reduced memory bandwidth (resulting in a0 码力 | 365 页 | 2.94 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 1 - Introductionactivation functions, which saturate at either 1.0 or -1.0 except a very small range of input. As a result, changing the input variable leads to a very tiny gradient (if any), and when there are a large number train models that performed well on unseen data (in other words, the models generalized well). As a result of this trailblazing work, there has been a race to create deeper networks with an ever larger number Learning Deep learning research has been focused on improving on the State of the Art, and as a result we have seen progressive improvements on benchmarks like image classification, text classification0 码力 | 21 页 | 3.17 MB | 1 年前3
Lecture 5: Gaussian Discriminant Analysis, Naive BayesNB and EM September 27, 2023 3 / 122 Sample Space, Events and Probability A sample space S is the set of all possible outcomes of a (conceptual or physical) random experiment Event A is a subset of the data, but how? Feng Li (SDU) GDA, NB and EM September 27, 2023 33 / 122 Warm Up (Contd.) Given a set of training data D = {x(i), y(i)}i=1,··· ,m The training data are sampled in an i.i.d. manner The 1}x(i)/ m � i=1 1{y(i) = 1} Σ = 1 m m � i=1 (x(i) − µy(i))(x(i) − µy(i))T Proof (see Problem Set 2) Feng Li (SDU) GDA, NB and EM September 27, 2023 50 / 122 Gaussian Discriminant Analysis (Contd0 码力 | 122 页 | 1.35 MB | 1 年前3
共 44 条
- 1
- 2
- 3
- 4
- 5













