《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression TechniquesChapter 2 - Compression Techniques “I have made this longer than usual because I have not had time to make it shorter.” Blaise Pascal In the last chapter, we discussed a few ideas to improve the deep deep learning efficiency. Now, we will elaborate on one of those ideas, the compression techniques. Compression techniques aim to reduce the model footprint (size, latency, memory etc.). We can reduce the chapter, we introduce Quantization, a model compression technique that addresses both these issues. We’ll start with a gentle introduction to the idea of compression. Details of quantization and its applications0 码力 | 33 页 | 1.96 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 5 - Advanced Compression TechniquesAdvanced Compression Techniques “The problem is that we attempt to solve the simplest questions cleverly, thereby rendering them unusually complex. One should seek the simple solution.” — Anton Pavlovich Pavlovich Chekhov In this chapter, we will discuss two advanced compression techniques. By ‘advanced’ we mean that these techniques are slightly more involved than quantization (as discussed in the second of our models. Did we get you excited yet? Let’s learn about these techniques together! Model Compression Using Sparsity Sparsity or Pruning refers to the technique of removing (pruning) weights during0 码力 | 34 页 | 3.18 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 1 - Introductionefficiency in deep learning models. We will also introduce core areas of efficiency techniques (compression techniques, learning techniques, automation, efficient models & layers, infrastructure). Our hope where there might not be a single algorithm that works perfectly, and there is a large amount of unseen data that the algorithm needs to process. Unlike traditional algorithm problems where we expect exact leeway in model quality, we can trade off some of it for a smaller footprint by using lossy model compression techniques7. For example, when compressing a model naively we might reduce the model size, RAM0 码力 | 21 页 | 3.17 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architectureschapter 2. We could also incorporate compression techniques such as sparsity, k-means clustering, etc. which will be discussed in the later chapters. 2. Even after compression, the vocabulary itself is large: Luong23 style and the Bahdanau24 style attention. In this book, we have chosen to discuss the Luong algorithm because it is used in Tensorflow’s attention layers. However, we encourage the interested readers embedding model’s quality and footprint metrics as discussed. We can combine other ideas from compression techniques and learning techniques on top of efficient architectures. As an example, we can train0 码力 | 53 页 | 3.92 MB | 1 年前3
深度学习与PyTorch入门实战 - 54. AutoEncoder自编码器Visualization: https://projector.tensorflow.org/ ▪ Taking advantages of unsupervised data ▪ Compression, denoising, super-resolution … Auto-Encoders https://towardsdatascience.com/applied-deep-le0 码力 | 29 页 | 3.49 MB | 1 年前3
《TensorFlow 快速入门与实战》7-实战TensorFlow人脸识别Xudong Cao,Fang Wen,Jian Sun.Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification.2013, computer vision and pattern recognition. �L��������� ������L������������0 码力 | 81 页 | 12.64 MB | 1 年前3
从推荐模型的基础特点看大规模推荐类深度学习系统的设计 袁镱Communication for Distributed Deep Learning: Survey and Quantitative Evaluation [ICLR2018]Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training Dense参数,每次 都⽤,快速收敛 Sparse参数,随数0 码力 | 22 页 | 6.76 MB | 1 年前3
Lecture 5: Gaussian Discriminant Analysis, Naive BayesLecture 5: Gaussian Discriminant Analysis, Naive Bayes and EM Algorithm Feng Li Shandong University fli@sdu.edu.cn September 27, 2023 Feng Li (SDU) GDA, NB and EM September 27, 2023 1 / 122 Outline Warm-Up Case 3 Gaussian Discriminate Analysis 4 Naive Bayes 5 Expectation-Maximization (EM) Algorithm Feng Li (SDU) GDA, NB and EM September 27, 2023 2 / 122 Probability Theory Review Sample space E[log(X)] Feng Li (SDU) GDA, NB and EM September 27, 2023 90 / 122 The Expectation-Maximization (EM) Algorithm A training set {x(1), x(2), · · · , x(m)} (without labels) The log-likelihood function ℓ(θ) =0 码力 | 122 页 | 1.35 MB | 1 年前3
Lecture 6: Support Vector MachineProblem of SVM 4 SVM with Kernels 5 Soft-Margin SVM 6 Sequential Minimal Optimization (SMO) Algorithm Feng Li (SDU) SVM December 28, 2021 2 / 82 Hyperplane Separates a n-dimensional space into two product in some high dimensional fea- ture space F K(x, z) = (xTz)2 or (1 + xTz)2 Any learning algorithm in which examples only appear as dot products (x(i)Tx(j)) can be kernelized (i.e., non-linearlized) 1 = 0 ∴ y(i)(ω∗Tx(i) + b∗) = 1 Feng Li (SDU) SVM December 28, 2021 71 / 82 Coordinate Ascent Algorithm Consider the following unconstrained optimization problem max α J (α1, α2, · · · , αm) Repeat0 码力 | 82 页 | 773.97 KB | 1 年前3
Lecture Notes on Linear Regressiony(i)|. 2 Gradient Descent Gradient Descent (GD) method is a first-order iterative optimization algorithm for finding the minimum of a function. If the multi-variable function J(✓) is di↵erentiable in a = m X i=1 (✓T x(i) � y(i))x(i) j (5) We summarize the GD method in Algorithm 1. The algorithm usually starts 2 Algorithm 1: Gradient Descent Given a starting point ✓ 2 dom J repeat 1. Calculate 2: The convergence of GD algorithm. with a randomly initialized ✓. In each iteration, we update ✓ such that the objec- tive function is decreased monotonically. The algorithm is said to be converged when0 码力 | 6 页 | 455.98 KB | 1 年前3
共 34 条
- 1
- 2
- 3
- 4













