¥
支付方式
请使用微信扫一扫 扫描二维码支付
请使用支付宝扫一扫 扫描二维码支付
Batch Normalization
input data: N training examples in the current batch , and each batch has dimension D
compute the empirical mean and variance independently for each dimension.
BN usually inserted after fully connected or convolutional layers, and before nonlinearity.
which W will be best——loss fuction
x-the input to the algorithm
y-the label or predict
L=sigm[Li(f(xi,w),yi)/N]
different loss functions:
梯度:函数增加最快的方向
梯度下降:利用每一步的梯度决定下一步的梯度方向
Activation function:
(different choices for different nonlinearities)
it has 3 problems:
#1- saturated neurons "kill" the gradients
when x is equal to very negative or x is equal to large positive numbers, then they are all regions where the sigmoid function is flat. And it's going to kill off the gradient.
#2- sigmoid outputs are not zero-centered
To understand this problem :
reference:
https://blog.csdn.net/weixin_41417982/
article/details/81437088
https://liam0205.me/2018/04/17/zero-centered-active-function/
#3- exp() is a bit compute expensive
f(x) = max(0,x)
Leaky ReLU f(x) = max(0.01x,x)
Use ReLU and try out Leaky ReLU/Maxout/ELU
Data preprocessing:
[mean-centered]
in the training phase, determine the image mean. And then we apply the exact same mean to the test data.
from all the training data to gain mean image.
subtract the mean image
(mean image = [32,32,3] array)
subtract per channel image
(mean along each channel = 3 numbers)
Weight initialization:
Q: What happens when w=0 init is used?
A: They will all do the same things
Xavier initialization
梯度:多元情况下的导数,即偏导数组成的向量,负梯度方向指向了函数下降最快的方向
有限差分:慢
步长即学习率,超参数
优化器的选择
SGD
http://vision.stanford.edu/teaching/cs23n-demos/linear-classify/
词袋(Bag of Words)
常用:分为训练集,验证集和测试集;在训练集上进行训练,然后把在验证集上表现最好的模型用于测试集。
交叉验证:小数据,深度学习不常用。数据分为多份,最后一份作为测试集,剩下的其中一份轮流当验证集,剩余的为训练集。
计算能力
大数据
pascal,imagenet
t-SNE:一种降维方法
PCA(principal Component Analysis):主成成分分析,一种降维方法