《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression Techniquesespecially in signal processing. It is a process of converting high precision continuous values to low precision discrete values. Take a look at figure 2-3. It shows a sine wave and an overlapped quantized sine precision representation. The quantized sine wave is a low precision representation which takes integer values in the range [0, 5]. As a result, the quantized wave requires low transmission bandwidth. Figure work on a scheme for going from this higher-precision domain (32-bits) to a quantized domain (b-bit values). This process is nothing but (cue drum roll!) ...Quantization! Before we get our hands dirty, let0 码力 | 33 页 | 1.96 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 7 - Automationthat a point in such a region is a set of well-defined values for each of those parameters. The parameters can take discrete or continuous values. It is called a "search" space because we are searching technique is turned on and a $$False$$ value means it is turned off. This search space1 has four possible values such that . Let's take another example of a search space with two parameters. However, in this this search space has infinitely many points because the second parameter can take infinitely many values. In the context of deep learning, the parameters that influence the process of learning are called0 码力 | 33 页 | 2.48 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 5 - Advanced Compression Techniquesthe data that we are quantizing is not uniformly distributed, i.e. the data is more likely to take values in a certain range than another equally sized range. It creates equal sized quantization ranges (bins) valued weights in each training epoch. The result of such a training process is p% weights with zero values. Sparse compressed models achieve higher compression ratio which results in lower transmission and retained nodes have fewer connections. Let's do an exercise to convince ourselves that setting parameter values to zero indeed results in a higher compression ratio. Figure 5-1: An illustration of pruning weights0 码力 | 34 页 | 3.18 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 3 - Learning Techniquesdataset Nx the size? What are the constraining factors? An image transformation recomputes the pixel values. The rotation of an RGB image of 100x100 requires at least 100x100x3 (3 channels) computations. Two is resized to 224x224px prior to the transformations. Value transformation operates on the pixel values. Let’s take brightness transformation as an example. Figure 3-6 shows an image 2x bright (bottom-right value range . Any channel values that exceed 255 after the 2x brightness transformation are clipped to 255 The channel values , after the 2x transformation, become . These values are clipped to 255. We0 码力 | 56 页 | 18.93 MB | 1 年前3
Experiment 1: Linear Regressionof heights for various boys between the ages of two and eights. The y-values are the heights measured in meters, and the x-values are the ages of the boys corresponding to the heights. Each height and x ] ; % Add a column of ones to x 2 From this point on, you will need to remember that the age values from your training data are actually in the second column of x. This will be important when plotting converges. (this will take a total of about 1500 iterations). After convergence, record the final values of θ0 and θ1 that you get, and plot the straight line fit from your algorithm on the same graph as0 码力 | 7 页 | 428.11 KB | 1 年前3
keras tutorialsparse=True) >>> print(a) SparseTensor(indices=Tensor("Placeholder_8:0", shape=(?, 2), dtype=int64), values=Tensor("Placeholder_7:0", shape=(?,), dtype=float32), dense_shape=Tensor("Const:0", shape=(2,) mean represent the mean of the random values to generate stddev represent the standard deviation of the random values to generate seed represent the values to generate random number RandomUniform ) where, minval represent the lower bound of the random values to generate maxval represent the upper bound of the random values to generate TruncatedNormal Generates value using truncated0 码力 | 98 页 | 1.57 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architecturesdangerous animals, and represent each animal using two features, say cute and dangerous. We can assign values between 0.0 and 1.0 to these two features for different animals. The higher the value, the more that that particular feature represents the given animal. In Table 4-1 we manually assigned values for the cute and dangerous features for six animals2, and we are calling the tuple of these two features an between 0.0 and 1.0. We manually picked these values for illustration. Going through table 4-1, cat and dog have high values for the ‘cute’ feature, and low values for the ‘dangerous’ feature. On the other0 码力 | 53 页 | 3.92 MB | 1 年前3
Lecture 1: Overvieware primarily interested in prediction We are interested in predicting only one thing The possible values of what we want to predict are specified, and we have some training cases for which its value is that is, variables whose values are unknown, such that the corresponding design matrix will then have “holes” in it The goal of matrix completion is to infer plausible values for the missing entries Optimization and Integration Usually involve finding the best values for some parameters (an opti- mization problem), or average over many plausible values (an integration problem). How can we do this efficiently0 码力 | 57 页 | 2.41 MB | 1 年前3
Lecture Notes on Gaussian Discriminant Analysis, Naiveψy(i)(1 − ψ)y(i) We then maximize the log-likelihood function ℓ(ψ, µ0, µ1, Σ) so as to get the optimal values for ψ, µ0, and σ, such that the resulting GDA model can best fit the given training data. In particular the finite training data. Apparently, this is quite unreasonable! Similarly, when some of the label values (e.g., ¯y) doe not appear in the given training data, we have p(¯y) = �m i=1 1(y(i) = ¯y) m = 0 �m i=1 1(y(i) = y ∧ x(i) j = x) + 1 �m i=1 1(y(i) = y) + vj 8 where vj is the number of possible values of the j-th feature. In our case where xj ∈ {0, 1} for ∀j ∈ [n], we have vj = 2 for ∀j. Note that0 码力 | 19 页 | 238.80 KB | 1 年前3
Lecture 5: Gaussian Discriminant Analysis, Naive Bayes+ 1 �m i=1 1(y (i) = y) + vj where k is number of the possible values of y (k=2 in our case), and vj is the number of the possible values of the j-th feature (vj = 2 for ∀j = 1, · · · , n in our case) length x(i) = [x(i) 1 , x(i) 2 , · · · , x(i) ni ]T The j-th feature of x(i) takes a finite set of values x(i) j ∈ {1, 2, · · · , v}, for ∀j = 1, · · · , ni For example, x(i) j indicates the j-th word features x(i) = [x(i) 1 , x(i) 2 , · · · , x(i) ni ]T The j-th feature of x(i) takes a finite set of values x(i) j ∈ {1, 2, · · · , v}, for ∀j = 1, · · · , ni For each training data, the features are i.i0 码力 | 122 页 | 1.35 MB | 1 年前3
共 27 条
- 1
- 2
- 3













