Coursera 7 - Support Vector Machines   2017-10-13


From Logistic Regression to Support Vector Machines

1. Large Margin Classification

Alternation view of logistic regression

$ \begin{align} h_\theta (x) = g({\theta^T x}) = \dfrac{1}{1 + e^{-\theta^T x}} \end{align} \; , \; h_\theta (x) \in [0, 1] $

$ y = 1 \; when \; h_\theta(x) = g(\theta^T x) \geq 0.5 \; when \; \theta^T x \geq 0 $.

$ y = 0 \; when \; h_\theta(x) = g(\theta^T x) \le 0.5 \; when \; \theta^T x \le 0 $

We can compress our cost function’s two conditional cases into one case:

$ \mathrm{Cost}(h_\theta(x),y) = - y \cdot \log(h_\theta(x)) - (1 - y) \cdot \log(1 - h_\theta(x))$

We can fully write out our entire cost function as follows:

$
J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))]
$

$
J(\theta) = \mathop{min}\limits_{_\theta} \frac{1}{m} \left[ \displaystyle \sum_{i=1}^m y^{(i)}\ \left(-\log h_\theta (x^{(i)}) \right) + (1 - y^{(i)}) \left( - \log (1 - h_\theta(x^{(i)})) \right) \right]+ \frac{\lambda}{2m} \displaystyle \sum_{j=1}^n \theta_j^2
$

$cost_1(\theta^T x^{i}) = -\log h_\theta (x^{(i)})$
$cost_0(\theta^T x^{i}) = - \log (1 - h_\theta(x^{(i)}))$

$
J(\theta) = \mathop{min}\limits_{_\theta} \frac{1}{m} \left[ \displaystyle \sum_{i=1}^m y^{(i)}\ \left(cost_1(\theta^T x^{i}) \right) + (1 - y^{(i)}) \left( cost_0(\theta^T x^{i}) \right) \right]+ \frac{\lambda}{2m} \displaystyle \sum_{j=1}^n \theta_j^2
$

1.1 Optimization Objective

$
J(\theta) = \mathop{min}\limits_{_\theta} \frac{1}{m} \left[ \displaystyle \sum_{i=1}^m y^{(i)}\ \left(cost_1(\theta^T x^{i}) \right) + (1 - y^{(i)}) \left( cost_0(\theta^T x^{i}) \right) \right]+ \frac{\lambda}{2m} \displaystyle \sum_{j=1}^n \theta_j^2
$

令 $C = \frac{1}{\theta}$

$
J(\theta) = \mathop{min}\limits_{_\theta} C \displaystyle \sum_{i=1}^m \left[ y^{(i)}\ cost_1(\theta^T x^{i}) + (1 - y^{(i)}) cost_0(\theta^T x^{i}) \right]+ \frac{1}{2m} \displaystyle \sum_{j=1}^n \theta_j^2
$

1.2 Large Margin Intuition

1.3 Mathematics Behind Large Margin Classification

2. Kernels

2.1 Kernels I

2.2 Kernels II

3. SVMs in Practice

3.1 Using An SVM

Reference


分享到:


  如果您觉得这篇文章对您的学习很有帮助, 请您也分享它, 让它能再次帮助到更多的需要学习的人. 您的支持将鼓励我继续创作 !
本文基于署名4.0国际许可协议发布,转载请保留本文署名和文章链接。 如您有任何授权方面的协商,请邮件联系我。

Contents

  1. 1. Large Margin Classification
    1. 1.1 Optimization Objective
    2. 1.2 Large Margin Intuition
    3. 1.3 Mathematics Behind Large Margin Classification
  2. 2. Kernels
    1. 2.1 Kernels I
    2. 2.2 Kernels II
  3. 3. SVMs in Practice
    1. 3.1 Using An SVM
  4. Reference