【中文字幕】2017春季CS231n 斯坦福深度视觉识别课

[展开全文]

志骁 · 2018-10-30 · 1.3 课程后勤 0

Batch Normalization

input data: N training examples in the current batch , and each batch has dimension D

compute the empirical mean and variance independently for each dimension.

BN usually inserted after fully connected or convolutional layers, and before nonlinearity.

[展开全文]

Jerry同 · 2018-10-30 · 6.2 批量归一化 0

which W will be best——loss fuction

x-the input to the algorithm

y-the label or predict

L=sigm[Li(f(xi,w),yi)/N]

different loss functions:

SVM

multi-class SVM loss-Hinge loss

Syi-the score of the true class for some training examples(x-axis)

Loss-(y-axis)

Sj-For a sample of the data set, the score of any class which is not the true class.

Li = sigm(j!=yi)[max(0,Sj-Syi+1)]

L=sigmLi/N ——quantitive meature of how bad is the classifier on this data set.

#At initialization W is small so all s≈0，and s is nearly the same for all classes, what is the loss?

The loss is number of classes minus 1.

This is a useful debugging strategy when you're using these things. When start off training, we are supposed to think about what we expect our loss to be. And the loss we actually see at the start of training at that first iteration is not equal to C-1. In this case, that means we probably have a bug and we should go check our code.

2.Softmax classifier(Multinomial logistic regression)

transfer the score to probability which is between zero and one

#At initialization W is small so all s≈0，what is the loss?

The loss is log(C)

3.regularization

To void over-fitting.

The loss function to make model predictions match training data, but it may cause over-fitting. So we should use regularization to make model 'simple', and then works on test data.

L(W) = loss + regularization

[展开全文]

Jerry同 · 2018-10-17 · 3.1 损失函数 0

梯度：函数增加最快的方向

梯度下降：利用每一步的梯度决定下一步的梯度方向

[展开全文]

酱酱紫 · 2018-10-14 · 3.2 优化 0

s 通过分类器预测出来的分类的分数
Y_i 样本的正确分类标签
S_Y_i:训练集的第i个样本的正确分类的分数
y label 你希望算法预测出来的东西
x 输入的图片，数据集
dela=1:任意的参数
如果将所有错误的错误分数求和，所欲的正确都求和,所有的损失函数都正则化：减轻模型的复杂度，而不是试图去拟合数据
L2如何度量复杂性的模型

[展开全文]

酱酱紫 · 2018-10-14 · 3.1 损失函数 0

Activation function:

(different choices for different nonlinearities)

sigmoid

it has 3 problems:

#1- saturated neurons "kill" the gradients

when x is equal to very negative or x is equal to large positive numbers, then they are all regions where the sigmoid function is flat. And it's going to kill off the gradient.

#2- sigmoid outputs are not zero-centered

To understand this problem :

reference:

https://blog.csdn.net/weixin_41417982/

article/details/81437088

https://liam0205.me/2018/04/17/zero-centered-active-function/

#3- exp() is a bit compute expensive

tanh - range [-1,1]
ReLU

f(x) = max(0,x)

Leaky ReLU f(x) = max(0.01x,x)

Use ReLU and try out Leaky ReLU/Maxout/ELU

Data preprocessing:

[mean-centered]

in the training phase, determine the image mean. And then we apply the exact same mean to the test data.

from all the training data to gain mean image.

subtract the mean image

(mean image = [32,32,3] array)

subtract per channel image

(mean along each channel = 3 numbers)

Weight initialization:

Q: What happens when w=0 init is used?

A: They will all do the same things

Xavier initialization

[展开全文]

Jerry同 · 2018-10-14 · 6.1 激活函数 0

梯度：多元情况下的导数，即偏导数组成的向量，负梯度方向指向了函数下降最快的方向

有限差分：慢

步长即学习率，超参数

优化器的选择

SGD

http://vision.stanford.edu/teaching/cs23n-demos/linear-classify/

词袋（Bag of Words）

[展开全文]

Kratos_yfc · 2018-10-12 · 3.2 优化 0

常用：分为训练集，验证集和测试集；在训练集上进行训练，然后把在验证集上表现最好的模型用于测试集。

交叉验证：小数据，深度学习不常用。数据分为多份，最后一份作为测试集，剩下的其中一份轮流当验证集，剩余的为训练集。

[展开全文]

Kratos_yfc · 2018-10-12 · 2.2 图像分类 - K最近邻算法 0

计算能力

大数据

pascal，imagenet

[展开全文]

xialotte · 2018-10-10 · 1.3 课程后勤 0

t-SNE:一种降维方法

PCA(principal Component Analysis):主成成分分析，一种降维方法

[展开全文]

邱忠喜 · 2018-10-09 · 12.1 特征可视化、倒置、对抗样本 0

【中文字幕】2017春季CS231n 斯坦福深度视觉识别课

相关课程

授课教师

最新学员

学员动态