Gradient Descent - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Experiment 2: Logistic Regression and Newton's Method

is the gradient of L and can be defined as $$ \nabla_{\theta}L=\frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)} $$ One approach to minimize the above objective function is gradient descent |L^{+}(\theta)-L(\theta)|\leq\epsilon $$ Try to resolve the logistic regression problem using gradient descent method with the initialization $ \theta = 0 $ , and answer the following questions: 1. Assume $ \epsilon = 10^{-6} $ . How many iterations are required to achieve convergence? Note that gradient descent method has a very slow convergence rate and may take a long while to achieve the minimum. 2

0 码力 | 4 页 | 196.41 KB | 2 年前
3
深度学习与PyTorch入门实战 - 35. Early-stopping-Dropout

## PyTorch ## Early Stop,Dropout 主讲人：龙良曲 ## Tricks Early Stopping Dropout ■ Stochastic Gradient Descent ## Early Stopping ■ Regularization ![Image](/uploads/documents/2/3/8/a/238a68d387ca55270073f44864d93685/p3_1 correct = 0 for data, target in test_loader: ## Stochastic Gradient Descent ## ■ Stochastic not random! ## ■ Deterministic ## Gradient Descent $$ \frac{\partial}{\partial\theta_{j}}J(\theta)=\frac{ \underbrace{}_{m} \underbrace{\frac{m}{\sum_{i=1}^{m}} (\hat{y}^{i} - y^{i})} x_{j}^{i} $$ ## Gradient Descent ③ Stochastic G.D. for i in range(M): $$ \theta_{j} := \theta_{j} - \alpha \cdot \frac{\overline{(only

0 码力 | 16 页 | 1.15 MB | 2 年前
3
Lecture Notes on Linear Regression

data to the hyperplane is denoted by $ |\theta^{T} x^{(i)} - y^{(i)}| $ . ## 2 Gradient Descent Gradient Descent (GD) method is a first-order iterative optimization algorithm for finding the minimum J(\theta) $ decreases fastest if one goes from $ \theta $ in the direction of the negative gradient of J at $ \theta $ . Let $$ \nabla J(\theta)=[\frac{\partial J}{\partial\theta_{0}},\frac{\partial \frac{\partial J}{\partial\theta_{1}},\cdots,\frac{\partial J}{\partial\theta_{n}}]^{T} $$ denote the gradient of $ J(\theta) $ . In each iteration, we update $ \theta $ according to the following rule:

0 码力 | 6 页 | 455.98 KB | 2 年前
3
Lecture 2: Linear Regression

8687b8d2ce249d05/p2_3.jpg) 3 Gradient Descent Algorithm ![Image](/uploads/documents/2/3/8/f/238fa969a3a333558687b8d2ce249d05/p2_4.jpg) 4 Stochastic Gradient Descent ![Image](/uploads/documents/2 of u can be represented as $$ \nabla_{u}f(x)=\sum_{i=1}^{n}f_{i}^{\prime}(x)\cdot u_{i} $$ ### Gradient (Contd.) ## Proof Letting $ g(h) = f(x + hu) $ , we have $$ g^{\prime}(0)=\lim_{h\to0}\f f_{i}'(x) u_{i} $ , by substituting which into (1), we complete the proof. ## Definition Gradient: The gradient of f is a vector function $ \nabla f : R^{n} \rightarrow R^{n} $ defined by $$ \nabla

0 码力 | 31 页 | 608.38 KB | 2 年前
3
Experiment 1: Linear Regression

J(\theta)=\frac{1}{2m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})^{2} $$ One of the optimization approach is gradient descent algorithm. The algorithm is performed iteratively, and in each iteration, we update parameter $ \alpha $ is so-called “learning rate” based on which we can tune the convergence of the gradient descent. ## 3 2D Linear Regression We start a very simple case where n = 1. Download data1.zip, and are m = 50 training examples, and you will use them to develop a linear regression model using gradient descent algorithm, based on which, we can predict the height given a new age value. In Matlab/Octave

0 码力 | 7 页 | 428.11 KB | 2 年前
3
机器学习课程-温州大学-02深度学习-神经网络的编程基础

批量梯度下降（Batch Gradient Descent, BGD）梯度下降的每一步中，都用到了所有的训练样本随机梯度下降（Stochastic Gradient Descent, SGD）梯度下降的每一步中，用到一个样本，在每一次计算之后便更新参数，而不需要首先将所有的训练集求和小批量梯度下降（Mini-Batch Gradient Descent, MBGD）梯度下降的每一步中，用到了一定批量的训练样本梯度下降的三种形式批量梯度下降 (Batch Gradient Descent) 梯度下降的每一步中，都用到了所有的训练样本学习率参数更新梯度 (同步更新 $ w_{j} $ ， $ (j=0,1,\ldots,n) $ ) ## 梯度下降的三种形式 ## 随机梯度下降 (Stochastic Gradient Descent) $$ \begin{aligned} 推导 ig(x^{(i)}\big)-y^{(i)}\big)x_{j}^{(i)}\end{aligned} $$ ## 梯度下降的三种形式 ## 随机梯度下降（Stochastic Gradient Descent）梯度下降的每一步中，用到一个样本，在每一次计算之后便更新参数，而不需要首先将所有的训练集求和 ## 参数更新 $$ w_{j}\text{:=}w_{j}-\alpha\b

0 码力 | 27 页 | 1.54 MB | 2 年前
3
Machine Learning

1.jpg) ![Image](/uploads/documents/7/8/e/d/78eda1ff9b05774141cbb2ab86fc7dce/p8_2.jpg) ## Gradient Descent (GD) Algorithm • If the multi-variable cost (or loss) function $ \mathcal{L}(\theta) $ is from $ \theta $ in the direction of the negative gradient of L at $ \theta $ • Find a local minimum of a differentiable function using gradient descent $$ \theta_{j}\leftarrow\theta_{j}-\alpha\frac \alpha $ is so-called learning rate • Variations • Gradient ascent algorithm • Stochastic gradient descent/ascent • mini-batch gradient descent/ascent ## Back-Propagation: Warm Up • $ w_{jk}^{[l]}

0 码力 | 19 页 | 944.40 KB | 2 年前
3
机器学习课程-温州大学-02机器学习-回归

批量梯度下降（Batch Gradient Descent, BGD）梯度下降的每一步中，都用到了所有的训练样本随机梯度下降（Stochastic Gradient Descent, SGD）梯度下降的每一步中，用到一个样本，在每一次计算之后便更新参数，而不需要首先将所有的训练集求和小批量梯度下降（Mini-Batch Gradient Descent, MBGD）梯度下降的每一步中，用到了一定批量的训练样本梯度下降的每一步中，用到了一定批量的训练样本 ## 梯度下降的三种形式批量梯度下降 (Batch Gradient Descent) 梯度下降的每一步中，都用到了所有的训练样本学习率参数更新 ![Image](/uploads/documents/9/9/0/e/990e7845b4e774c84fea91a89b00e1cf/p14_1.jpg) 梯度 (同步更新 $ w_{j} $ $ ， $ (j=0,1,\ldots,n) $ ) ## 梯度下降的三种形式 ## 随机梯度下降 (Stochastic Gradient Descent) $$ \begin{aligned} 推导 w&=w-\alpha\cdot\frac{\partial J(w)}{\partial w}\quad h(x)=w^{\mathrm{T}}X=w_{0}x_{0}+w_{1}x

0 码力 | 33 页 | 1.50 MB | 2 年前
3
Lecture 4: Regularization and Bayesian Statistics

parameters as well as the magnitude of $ \lambda $ ### Regularized Linear Regression (Contd.) ## • Gradient descent • Repeat { $$ \theta_{0}:=\theta_{0}-\alpha\frac{1}{m}\sum_{i=1}^{m}\big(h_{\theta}\big( bda}{2m}\sum_{j=1}^{n}\theta_{j}^{2} $$ ### Regularized Logistic Regression (Contd.) ## • Gradient descent: ## Repeat $$ \bullet\theta_{0}:=\theta_{0}-\alpha\frac{1}{m}\sum_{i=1}^{m}\big(h_{\thet \log[1+\exp(-y^{(i)}\theta^{T}x^{(i)})] $$ • No close-form solution exists, but we can do gradient descent on $ \theta $ ## Logistic Regression: MAP Solution • Again, assume θ follows a Gaussian distribution

0 码力 | 25 页 | 185.30 KB | 2 年前
3
《TensorFlow 快速入门与实战》3-TensorFlow基础概念解析

|Adam|tensorflow/python/training/adam.py| |Ftrl|tensorflow/python/training/ftrl.py| |Gradient Descent|tensorflow/python/training/gradient\_descent.py| |Momentum|tensorflow/python/training/momentum.py| |Proximal Adagr Adagrad|tensorflow/python/training/proximal\_adagrad.py| |Proximal Gradient Descent|tensorflow/python/training/proximal\_gradient\_descent.py| |Rmsprop|tensorflow/python/training/rmsprop.py| |Synchronize R

0 码力 | 50 页 | 25.17 MB | 2 年前
3

共 340 条前往

页

分类

语言

格式

Experiment 2: Logistic Regression and Newton's Method

深度学习与PyTorch入门实战 - 35. Early-stopping-Dropout

Lecture Notes on Linear Regression

Lecture 2: Linear Regression

Experiment 1: Linear Regression

机器学习课程-温州大学-02深度学习-神经网络的编程基础

Machine Learning

机器学习课程-温州大学-02机器学习-回归

Lecture 4: Regularization and Bayesian Statistics

《TensorFlow 快速入门与实战》3-TensorFlow基础概念解析

搜索

分类

语言

格式