《Efficient Deep Learning Book》[EDL] Chapter 6 - Advanced Learning Techniques - Technical Reviewlet's assume that we have such a general model that works for natural language inputs. Then by definition the model should be able to encode the given text in a sequence of embeddings such that there is 6-12 shows multiple examples of pacing functions. The x-axis is the training iteration i.e. the variable described above, and the y-axis is the fraction of data that is enabled from the sorted training recap, refer to the two plots in figure 6-13. Both are plots of functions in a single variable, with the variable on the x-axis and being the y-axis, and we are trying to find the minima for both. On0 码力 | 31 页 | 4.03 MB | 1 年前3
Lecture 2: Linear RegressionSeptember 13, 2023 3 / 31 Supervised Learning (Contd.) Features: input variables, x; Target: output variable, y; Training example: (x(i), y(i)), i = 1, 2, 3, ..., m Hypothesis: h : X → Y. Training � i=1 (hθ(x(i)) − y(i))2 Feng Li (SDU) Linear Regression September 13, 2023 9 / 31 Gradient Definition Directional Derivative: The directional derivative of function f : Rn → R in the direction u ∈ complete the proof. Feng Li (SDU) Linear Regression September 13, 2023 12 / 31 Gradient (Contd.) Definition Gradient: The gradient of f is a vector function ∇f : Rn → Rn defined by ∇f (x) = n � i=1 ∂f0 码力 | 31 页 | 608.38 KB | 1 年前3
Lecture 5: Gaussian Discriminant Analysis, Naive BayesP(A¬) = 1 − P(A) Feng Li (SDU) GDA, NB and EM September 27, 2023 5 / 122 Conditional Probability Definition of conditional probability: Fraction of worlds in which event A is true given event B is true GDA, NB and EM September 27, 2023 6 / 122 Conditional Probability (Contd.) Real valued random variable is a function of the outcome of a ran- domized experiment X : S → R Examples: Discrete random variables valued random variable is a function of the outcome of a ran- domized experiment X : S → R For continuous random variable X P(a < X < b) = P({s ∈ S : a < X(s) < b}) For discrete random variable X P(X =0 码力 | 122 页 | 1.35 MB | 1 年前3
Lecture 6: Support Vector Machineoptimization problem min ω f (ω) s.t. gi(ω) ≤ 0, i = 1, · · · , k hj(ω) = 0, j = 1, · · · , l with variable ω ∈ Rn, domain D = �k i=1 domgi ∩�l j=1 domhj, optimal value p∗ Objective function f (ω) k inequality mapped features remain efficient Feng Li (SDU) SVM December 28, 2021 47 / 82 Kernels: Formal Definition Each kernel K has an associated feature mapping φ φ takes input x ∈ X (input space) and maps non-separable case, we relax the above constraints as: y(i)(ωTx(i) + b) ≥ 1 − ξi for ∀i ξi is called slack variable Non-separable case We will allow misclassified training samples, but we want the number of such0 码力 | 82 页 | 773.97 KB | 1 年前3
Lecture Notes on Support Vector Machine(9) s.t. gi(ω) ≤ 0, i = 1, · · · , k (10) hj(ω) = 0, j = 1, · · · , l (11) where ω ∈ D is the variable with D = �k i=1 domgi ∩ �l j=1 domhj representing the feasible domain defined by the constraints f(ω∗) The first equality is due to the strong duality, and we have the second one according to the definition of the dual function. The third inequality follows because the infimum of the Lagrangian over ω above constraints as: y(i)(ωT x(i) + b) ≥ 1 − ξi for ∀i = 1, · · · , m, where ξi is called slack variable. In the non-separable case, we allow misclassified training examples, but we would like the number0 码力 | 18 页 | 509.37 KB | 1 年前3
PyTorch Release Notescontainer (defaults to all GPUs, but can be specified by using the NVIDIA_VISIBLE_DEVICES environment variable). For more information, refer to the nvidia-docker documentation. Note: Starting in Docker 19 hang or as an "illegal instruction" exception. A workaround for this case is to set the environment variable NCCL_PROTO=^LL128. This issue will be addressed in an upcoming release. PyTorch RN-08516-001_v23 CUDA_MODULE_LOADING`. Refer to the CUDA C++ Programming Guide for more information about this environment variable. Announcements ‣ NVIDIA Deep Learning Profiler (DLProf) v1.8, which was included in the 21.120 码力 | 365 页 | 2.94 MB | 1 年前3
Keras: 基于 Python 的深度学习库super(MyLayer, self).__init__(**kwargs) def build(self, input_shape): # Create a trainable weight variable for this layer. self.kernel = self.add_weight(name='kernel', shape=(input_shape[1], self.output_dim) inputs = K.placeholder(ndim=3) 下面的代码实例化一个变量。它等价于 tf.Variable() 或 th.shared()。 import numpy as np val = np.random.random((3, 4, 5)) var = K.variable(value=val) # 全 0 变量: var = K.zeros(shape=(3, 4, 5)) 使用随机数初始化张量 b = K.random_uniform_variable(shape=(3, 4), low=0, high=1) # 均匀分布 c = K.random_normal_variable(shape=(3, 4), mean=0, scale=1) # 高斯分布 d = K.random_normal_variable(shape=(3, 4), mean=0, scale=1)0 码力 | 257 页 | 1.19 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 5 - Advanced Compression Techniquesa sample 2D weight matrix with randomly initialized float values. We also define a sparsity_rate variable initialized with the value 0.4 to sparsify 40% of the total number of weights. Finally, we compute that you are convinced that sparsity helps with improving compression. Increasing the sparsity_rate variable’s value will further reduce the size of the sparsified and compressed size. To take a step back 5-2 uses a fixed pruning rate $$p$$. However, we could use variable pruning rates across the pruning rounds. The motivation behind using variable sparsity is that a pre-trained model’s weights will get disrupted0 码力 | 34 页 | 3.18 MB | 1 年前3
全连接神经网络实战. pytorch 版. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.1 自定义 Variable 数据与网络训练 19 4.2 准确率的可视化 22 4.3 分类结果的可视化 23 4.4 自定义 Dataset 数据集 25 3 4.5 总结 27 Literature . chapter3-3.py。 4. 构建自己的数据集 4.1 自定义 Variable 数据与网络训练 19 4.2 准确率的可视化 22 4.3 分类结果的可视化 23 4.4 自定义 Dataset 数据集 25 4.5 总结 27 本章我们的目标是把构建自己的数据集,并来测试和可视化。 4.1 自定义 Variable 数据与网络训练 假如我们并没有图像数据,我们自己创造一些数据,并用它们来分类。 import torch import numpy as np # 生 成 数 据 def dataGenerate ( data , l a b e l ) : 19 20 4.1. 自定义 Variable 数据与网络训练 f o r idata in data : i f idata [ 0 ] < 0 . 5 : # 把 小 于0 .5 的 值 压 缩 到 [ 0 , 1 ] 之 间0 码力 | 29 页 | 1.40 MB | 1 年前3
Lecture Notes on Gaussian Discriminant Analysis, Naiveimage. We assume X = [X1, X2, · · · , Xn]T is a random variable representing the feature vector of the given image, and Y ∈ {0, 1} is a random variable representing if there is a cat in the given image. Now = ψy(1 − ψ)1−y (5) • A2: X | Y = 0 ∼ N(µ0, Σ): The conditional probability of continuous random variable X given Y = 0 is a Gaussian distribution parameterized by µ0 and Σ, such that the corresponding Σ−1(x − µ0) � (6) • A3: X | Y = 1 ∼ N(µ1, Σ): The conditional probability of continuous random variable X given Y = 1 is a Gaussian distribution parameterized by µ1 and Σ, such that the corresponding0 码力 | 19 页 | 238.80 KB | 1 年前3
共 26 条
- 1
- 2
- 3













