首页 AI学术青年与开发者社区

【中文字幕】2017春季CS231n 斯坦福深度视觉识别课

开课时间:2017年11月10日
开课时长:讲座共有6个lecture,3个 Guest Talk,已完结。加入小组即可立即观看(完全免费):https://ai.yanxishe.com/page/groupDetail/19。
免费课 基础入门
现价: 免费
该课程已关闭

Batch Normalization

input data: N training examples in the current batch , and each batch has dimension D

compute the empirical mean and variance independently for each dimension.

BN usually inserted after fully connected or convolutional layers, and before nonlinearity.

[展开全文]

which W will be best——loss fuction

x-the input to the algorithm

y-the label or predict

L=sigm[Li(f(xi,w),yi)/N]

different loss functions:

  1. SVM
     multi-class SVM loss-Hinge loss
      Syi-the score of the true class for some training examples(x-axis)
      Loss-(y-axis)
      Sj-For a sample of the data set, the score of any class which is not the true class.
     Li = sigm(j!=yi)[max(0,Sj-Syi+1)]
     L=sigmLi/N ——quantitive meature of how bad is the classifier on this data set.
     #At initialization W is small so all s≈0,and s is nearly the same for all classes, what is the loss?
   The loss is number of classes minus 1.
This is a useful debugging strategy when you're using these things. When start off training, we are supposed to think about what we expect our loss to be. And the loss we actually see at the start of training at that first iteration is not equal to C-1. In this case, that means we probably have a bug and we should go check our code.
 
     2.Softmax classifier(Multinomial logistic regression)
     transfer the score to probability which is between zero and one
    #At initialization W is small so all s≈0,what is the loss?
    The loss is log(C)
 
     3.regularization
     To void over-fitting.
     The loss function to make model predictions match training data, but it may cause over-fitting. So we should use regularization to make model 'simple', and then works on test data.
     L(W) = loss + regularization

 

 

     
[展开全文]
Jerry同 · 2018-10-17 · 3.1 损失函数 0

梯度:函数增加最快的方向

梯度下降:利用每一步的梯度决定下一步的梯度方向

[展开全文]
酱酱紫 · 2018-10-14 · 3.2 优化 0
  • s 通过分类器预测出来的分类的分数
  • Y_i 样本的正确分类标签
  • S_Y_i:训练集的第i个样本的正确分类的分数
  • y label 你希望算法预测出来的东西
  • x  输入的图片,数据集
  • dela=1:任意的参数
  • 如果将所有错误的错误分数求和,所欲的正确都求和,所有的损失函数都正则化:减轻模型的复杂度,而不是试图去拟合数据
  • L2如何度量复杂性的模型
[展开全文]
酱酱紫 · 2018-10-14 · 3.1 损失函数 0

Activation function:

(different choices for different nonlinearities)

  • sigmoid

it has 3 problems:

#1- saturated neurons "kill" the gradients

when x is equal to very negative or x is equal to large positive numbers, then they are all regions where the sigmoid function is flat. And it's going to kill off the gradient.

#2- sigmoid outputs are not zero-centered

To understand this problem :

reference:

https://blog.csdn.net/weixin_41417982/

article/details/81437088 

https://liam0205.me/2018/04/17/zero-centered-active-function/

#3- exp() is a bit compute expensive

  • tanh - range [-1,1]
  • ReLU

 f(x) = max(0,x)

Leaky ReLU f(x) = max(0.01x,x)

Use ReLU and try out Leaky ReLU/Maxout/ELU

Data preprocessing:

 [mean-centered]

in the training phase, determine the image mean. And then we apply the exact same mean to the test data.

from all the training data to gain mean image.

subtract the mean image

(mean image = [32,32,3] array)

subtract per channel image

(mean along each channel = 3 numbers)

Weight initialization:

Q: What happens when w=0 init is used?

A: They will all do the same things

Xavier initialization

 

 

[展开全文]
Jerry同 · 2018-10-14 · 6.1 激活函数 0

梯度:多元情况下的导数,即偏导数组成的向量,负梯度方向指向了函数下降最快的方向

有限差分:慢

步长即学习率,超参数

优化器的选择

SGD

http://vision.stanford.edu/teaching/cs23n-demos/linear-classify/

词袋(Bag of Words)

[展开全文]
Kratos_yfc · 2018-10-12 · 3.2 优化 0

常用:分为训练集,验证集和测试集;在训练集上进行训练,然后把在验证集上表现最好的模型用于测试集。

交叉验证:小数据,深度学习不常用。数据分为多份,最后一份作为测试集,剩下的其中一份轮流当验证集,剩余的为训练集。

[展开全文]

计算能力

大数据

pascal,imagenet

[展开全文]
xialotte · 2018-10-10 · 1.3 课程后勤 0

t-SNE:一种降维方法

PCA(principal Component Analysis):主成成分分析,一种降维方法

[展开全文]

相关课程

开课日期:直播已结束,可回看开始
智能驾驶 基础入门 86962
开课日期:开始
机器学习 基础入门 111694

授课教师

暂无教师
微信扫码分享课程