《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architecturesworld, we must automate the embedding table generation because of the high costs associated with manual embeddings. One example of an automated embedding generation technique is the word2vec family of Kharameh"," Mahmudabad (Persian:دﺎﺑادﻮﻤﺤﻣ also Romanized as Maḩmūdābād; also known as Maḩbūdābād-e Pā’īn Mahmood Abad Hoomeh Maḩmūdābād-e Ḩūmeh and Maḩmūdābād-e Pā’īn) is a village in Korbal Rural District in E., & Johnson, M. (2021). Distilling Large Language Models into Tiny and Effective Students using pQRNN. arXiv preprint arXiv:2101.08890. 15 Chung, H. W., Fevry, T., Tsai, H., Johnson, M., & Ruder, S.0 码力 | 53 页 | 3.92 MB | 1 年前3
机器学习课程-温州大学-03深度学习-PyTorch入门greater x.le/x.gt np.greater_equal/np.equal/np.not_equal x.ge/x.eq/x.ne 随机种子 np.random.seed torch.manual_seed 1.Tensors张量的概念 10 Python、PyTorch 1.x与TensorFlow2.x的比较 类别 Python PyTorch 1+ TensorFlow 参考文献 1. IAN GOODFELLOW等,《深度学习》,人民邮电出版社,2017 2. Andrew Ng,http://www.deeplearning.ai 3. Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer-Verlag, 2006 4. 李宏毅,《一天搞懂深度学习》 5. 吴茂0 码力 | 40 页 | 1.64 MB | 1 年前3
PyTorch Release Notesperformance drop for GNMT training ‣ On Volta: ‣ Up to 20% performance drop for Tacotron training. ‣ Manual synchronization is required in CUDA graphs workloads between graph replays. ‣ The PyTorch container 21.04: ‣ On NVIDIA Ampere Architecture GPUs: ‣ Up to 17% performance drop for VGG16 training ‣ Manual synchronization is required in CUDA graphs workloads between graph replays. ‣ The DLProf TensorBoard Up to 20% performance drop in MaskRCNN training ‣ Up to 15% performance drop in VGG16 training ‣ Manual synchronization is required in CUDA graphs workloads between graph replays. ‣ The DLProf TensorBoard0 码力 | 365 页 | 2.94 MB | 1 年前3
Keras: 基于 Python 的深度学习库clear_session keras.backend.clear_session() 销毁当前的 TF 图并创建一个新图。 有用于避免旧模型/网络层混乱。 manual_variable_initialization keras.backend.manual_variable_initialization(value) 设置变量手动初始化的标志。 这个布尔标志决定了变量是否应该在实例化时初始化(默认),或者用户是否应该自己处理 用 PEP8 linter: • 安装 PEP8 包:pip install pep8 pytest-pep8 autopep8 • 运行独立的 PEP8 检查:py.test --pep8 -m pep8 • 你可以通过运行这个命令自动修复一些 PEP8 错误: autopep8 -i --select例如:autopep8 -i --select 0 码力 | 257 页 | 1.19 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 1 - Introductionpractitioners have to do. Apart from saving humans time, it also helps by reducing the bias that manual decisions might introduce when designing efficient networks. Automation techniques can help improve Automated Hyper-Param Optimization (HPO) is one such technique that can be used to replace / supplement manual tweaking of hyper-parameters like learning rate, regularization, dropout, etc. This relies on search0 码力 | 21 页 | 3.17 MB | 1 年前3
【PyTorch深度学习-龙龙老师】-测试版20211223 1.8 参考文献 [1] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou w_current, points, lr): # 计算误差函数在所有点上的导数,并更新 w,b b_gradient = 0 w_gradient = 0 M = float(len(points)) # 总样本数 for i in range(0, len(points)): x = points[i, 0] 2(wx+b-y),参考公式(2.3) b_gradient += (2/M) * ((w_current * x + b_current) - y) # 误差函数对 w 的导数:grad_w = 2(wx+b-y)*x,参考公式(2.2) w_gradient += (2/M) * x * ((w_current * x + b_current) - y)0 码力 | 439 页 | 29.91 MB | 1 年前3
PyTorch Tutorialout gradients after each update • t.grad.zero_() *Assume 't' is a tensor Autograd (continued) • Manual Weight Update - example Optimizer • Optimizers (optim package) • Adam, Adagrad, Adadelta, SGD etc0 码力 | 38 页 | 4.09 MB | 1 年前3
Lecture 5: Gaussian Discriminant Analysis, Naive BayesSeptember 27, 2023 33 / 122 Warm Up (Contd.) Given a set of training data D = {x(i), y(i)}i=1,··· ,m The training data are sampled in an i.i.d. manner The probability of the i-th training data (x(i), P(D) = m � i=1 pX|Y (x(i) | y (i))pY (y (i)) Feng Li (SDU) GDA, NB and EM September 27, 2023 34 / 122 Warm Up (Contd.) Log-likelihood function ℓ(θ) = log m � i=1 pX,Y (x(i), y(i)) = log m � i=1 i=1 pX|Y (x(i) | y(i))pY (y(i)) = m � i=1 � log pX|Y (x(i) | y(i)) + log pY (y(i)) � where θ = {pX|Y (x | y), pY (y)}x,y Feng Li (SDU) GDA, NB and EM September 27, 2023 35 / 122 Warm Up (Contd.)0 码力 | 122 页 | 1.35 MB | 1 年前3
Lecture Notes on Gaussian Discriminant Analysis, Naiveµ1)T Σ−1(x − µ1) � (7) Given m sample data {(x(i), y(i))}i=1,··· ,m, the log-likelihood is defined as ℓ(ψ, µ0, µ1, Σ) = log m � i=1 pX,Y (x(i), y(i); ψ, µ0, µ1, Σ) = log m � i=1 pX|Y (x(i) | y(i); µ0 µ0, µ1, Σ)pY (y(i); ψ) = m � i=1 log pX|Y (x(i) | y(i); µ0, µ1, Σ) + m � i=1 log pY (y(i); ψ)(8) where ψ, µ0, and σ are parameters. Substituting Eq. (5)∼(7) into Eq. (8) gives 2 us a full expression expression of ℓ(ψ, µ0, µ1, Σ) ℓ(ψ, µ0, µ1, Σ) = m � i=1 log pX|Y (x(i) | y(i); µ0, µ1, Σ) + m � i=1 log pY (y(i); ψ) = � i:y(i)=0 log � 1 (2π)n/2|Σ|1/2 exp � −1 2(x − µ0)T Σ−1(x − µ0) �� + � i:y(i)=10 码力 | 19 页 | 238.80 KB | 1 年前3
Lecture Notes on Support Vector Machinemargin of x0 (with respect to the hyperplane ωT x + b = 0). Now, given a set of m training data {(x(i), y(i))}i=1,··· ,m, we first assume that they are linearly separable. Specifically, there exists a with y(i) = 1, while ωT x(i) + b ≤ 0 for ∀i with y(i) = −1. As shown in Fig. 1, for ∀i = 1, · · · , m, we can calculate its margin as γ(i) = y(i) �� ω ∥ω∥ �T x(i) + b ∥ω∥ � (5) With respect to the · · · , k Aw − b = 0 if it is strictly feasible, i.e., ∃ ω ∈ relintD : gi(ω) < 0, i = 1, · · · , m, Aw = b Detailed proof of the above theorem can be found in Prof. Boyd and Prof. Vandenberghe’s Convex0 码力 | 18 页 | 509.37 KB | 1 年前3
共 68 条
- 1
- 2
- 3
- 4
- 5
- 6
- 7













