Lecture Notes on Support Vector MachineLecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ωT x + b = 0 (1) where ω ∈ Rn Rn is the outward pointing normal vector, and b is the bias term. The n-dimensional space is separated into two half-spaces H+ = {x ∈ Rn | ωT x + b ≥ 0} and H− = {x ∈ Rn | ωT x + b < 0} by the hyperplane margin is defined as γ = min i γ(i) (6) 1 ? ? ! ? ! Figure 1: Margin and hyperplane. 2 Support Vector Machine 2.1 Formulation The hyperplane actually serves as a decision boundary to differentiating0 码力 | 18 页 | 509.37 KB | 1 年前3
Lecture 6: Support Vector MachineLecture 6: Support Vector Machine Feng Li Shandong University fli@sdu.edu.cn December 28, 2021 Feng Li (SDU) SVM December 28, 2021 1 / 82 Outline 1 SVM: A Primal Form 2 Convex Optimization Review Hyperplane Separates a n-dimensional space into two half-spaces Defined by an outward pointing normal vector ω ∈ Rn Assumption: The hyperplane passes through origin. If not, have a bias term b; we will then along ω (b < 0 means in opposite direction) Feng Li (SDU) SVM December 28, 2021 3 / 82 Support Vector Machine A hyperplane based linear classifier defined by ω and b Prediction rule: y = sign(ωTx +0 码力 | 82 页 | 773.97 KB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression Techniquestakes a 32-bit floating point value in the range [-10.0, 10.0]. We need to transmit a collection (vector) of these variables over an expensive communication channel. Can we use quantization to reduce transmission learnings from the previous exercise into practice. We will code a method `quantize` that quantizes a vector x, given xmin, xmax, and b. It should return the quantized values for a given x. Logistics We just look at how to solve this exercise. We use NumPy for this solution. It supports vector operations which operate on a vector (or a batch) of x variables (vectorized execution) instead of one variable at a0 码力 | 33 页 | 1.96 MB | 1 年前3
Lecture 5: Gaussian Discriminant Analysis, Naive Bayes2023 22 / 122 Prediction Based on Bayes’ Theorem X is a random variable indicating the feature vector Y is a random variable indicating the label We perform a trial to obtain a sample x for test, and random An image is represented by a vector of features The feature vectors are random, since the images are randomly given Random variable X representing the feature vector (and thus the image) The labels (deterministic) hypothesis function y = hθ(x) How to model the (probabilistic) relationship between feature vector X and label Y ? P(Y = y | X = x) = P(X = x | Y = y)P(Y = y) P(X = x) Feng Li (SDU) GDA, NB and0 码力 | 122 页 | 1.35 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 3 - Learning Techniquesbelow to describe the learning rate, the length of the text, the size of the word vector (each word is translated to a vector) and the locations of initial weights and training checkpoints. A sample text is sentence to a word vector sequence later on. LEARNING_RATE = 0.001 MAX_SEQ_LEN = 500 # The sentences are truncated to this word count. WORD2VEC_LEN = 300 # The size of the word vector CHKPT_DIR = Path('chkpt') represents the number of representative words for a sample text (500 words) and the size of the embedding vector to represent each word (an array of 300 float values) respectively. def create_model(): model =0 码力 | 56 页 | 18.93 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architecturesinputs have similar representations. We will call this representation an Embedding. An embedding is a vector of features that represent aspects of an input numerically. It must fulfill the following goals: such as text, image, audio, video, etc. to a low-dimensional representation such as a fixed length vector of floating point numbers, thus performing dimensionality reduction1. b) The low-dimensional representation two features? In those cases, we could use classical machine learning algorithms like the Support Vector Machine4 (SVM) to learn classifiers that would do this for us. We could rely on deep learning models0 码力 | 53 页 | 3.92 MB | 1 年前3
Lecture Notes on Gaussian Discriminant Analysis, Naivea given image. We assume X = [X1, X2, · · · , Xn]T is a random variable representing the feature vector of the given image, and Y ∈ {0, 1} is a random variable representing if there is a cat in the given labeled by y given that the image can be represented by feature vector x, P(X = x | Y = y) is the probability that the image has its feature vector being x given that it is labeled by y, P(Y = y) is the probability logistic regression, we use hypothesis function y = hθ(x) to model the relationship between feature vector x and label y, while we now rely on Byes’ theorem to characterize the relationship through parameters0 码力 | 19 页 | 238.80 KB | 1 年前3
Experiment 1: Linear Regression(1) where θ is the parameter which we need to optimize and x is the (n + 1)- dimensional feature vector 1. Given a training set {x(i)}i=1,··· ,m, our goal is to find the optimal value of θ such that the For each training data, we have an extra intercept item x0 = 1. Therefore, the resulting feature vector is (n + 1)-dimensional. 1 3 2D Linear Regression We start a very simple case where n = 1. Download contours in the contour function, by introduction different spaced vector, e.g., linearly spaced vector (linspace) and logarithmically spaced vector (logspace). Try both in this exercises and select the better0 码力 | 7 页 | 428.11 KB | 1 年前3
Lecture 2: Linear Regression(x) represents the rate at which f is increased in direction u When u is the i-th standard unit vector ei, ∇uf (x) = f ′ i (x) where f ′ i (x) = ∂f (x) ∂xi is the partial derivative of f (x) w.r.t (SDU) Linear Regression September 13, 2023 10 / 31 Gradient (Contd.) Theorem For any n-dimensional vector u, the directional derivative of f in the direction of u can be represented as ∇uf (x) = n � i=1 Definition Gradient: The gradient of f is a vector function ∇f : Rn → Rn defined by ∇f (x) = n � i=1 ∂f ∂xi ei where ei is the i-th standard unit vector. In another simple form, ∇f (x) = � ∂f ∂x10 码力 | 31 页 | 608.38 KB | 1 年前3
机器学习课程-温州大学-09机器学习-支持向量机01 支持向量机概述 02 线性可分支持向量机 03 线性支持向量机 04 线性不可分支持向量机 4 1.支持向量机概述 支 持 向 量 机 ( Support Vector Machine, SVM ) 是 一 类 按 监 督 学 习 ( supervised learning)方式对数据进行二元分类的广义线性 分类器(generalized linear classifier),其决 况。软间隔,就是允许一定量的样本分类错误。 软间隔 硬间隔 线性可分 线性不可分 6 支持向量 1.支持向量机概述 算法思想 找到集合边缘上的若干数据(称为 支持向量(Support Vector)) ,用这些点找出一个平面(称为决 策面),使得支持向量到该平面的 距离最大。 距离 7 1.支持向量机概述 背景知识 任意超平面可以用下面这个线性方程来描述: ?T? + 大于50000,则使用支 持向量机会非常慢,解决方案是创造、增加更多的特征,然后使用逻辑回归 或不带核函数的支持向量机。 28 参考文献 [1] CORTES C, VAPNIK V. Support-vector networks[J]. Machine learning, 1995, 20(3): 273–297. [2] Andrew Ng. Machine Learning[EB/OL]. StanfordUniversity0 码力 | 29 页 | 1.51 MB | 1 年前3
共 29 条
- 1
- 2
- 3













