【中文字幕】2017春季CS231n 斯坦福深度视觉识别课

使用sigmoid,x输入均值要为0.避免X全正或负，否则梯度更新只能沿着一个方向。W的梯度永远为一个符号，相同于X

[展开全文]

鲍伯•阿丽丝 · 2019-06-20 · 6.1 激活函数 0

1. activation functions

sigmoid,tanh,relu, leaky relu, maxout, elu

sigmoid: 1/(1+e-x) [0,1] 饱和梯度消失，非0中心

tanh [-1,1] ,0 中心，梯度消失

relu,不饱和，合理，收敛快，plausible，为负：梯度消失

dead relu，

不合理的初始化，lr太多

dont use sigmoid

初始化也很重要，随着传播每层的分布会变，sigmod的方差会减小，relu的方差也会减少，

目前有些比较好的初始化方法xavier

[展开全文]

巴克•米尔恩 · 2018-12-26 · 6.1 激活函数 0

Activation function:

(different choices for different nonlinearities)

sigmoid

it has 3 problems:

#1- saturated neurons "kill" the gradients

when x is equal to very negative or x is equal to large positive numbers, then they are all regions where the sigmoid function is flat. And it's going to kill off the gradient.

#2- sigmoid outputs are not zero-centered

To understand this problem :

reference:

https://blog.csdn.net/weixin_41417982/

article/details/81437088

https://liam0205.me/2018/04/17/zero-centered-active-function/

#3- exp() is a bit compute expensive

tanh - range [-1,1]
ReLU

f(x) = max(0,x)

Leaky ReLU f(x) = max(0.01x,x)

Use ReLU and try out Leaky ReLU/Maxout/ELU

Data preprocessing:

[mean-centered]

in the training phase, determine the image mean. And then we apply the exact same mean to the test data.

from all the training data to gain mean image.

subtract the mean image

(mean image = [32,32,3] array)

subtract per channel image

(mean along each channel = 3 numbers)

Weight initialization:

Q: What happens when w=0 init is used?

A: They will all do the same things

Xavier initialization

[展开全文]

Jerry同 · 2018-10-14 · 6.1 激活函数 0

激活函数relu更符合神武神经元的特性。

2012年赢得image net大赛的alexnet就是使用了relu

relu在负半轴的时候会出现梯度消失的情况，我们称之为dead relu--- will never activate

数据预处理：

0均值化

归一化

白化

权重初始化：我们一般采用截断正态分布

经验表明：Xavier初始化也是一个不错的选择。

如果直接设置所有的权重初始化为0，那样你的所有神经元得到的输出是一样的，梯度更新的结果也是一样的，总之你的所有神经元学习到的结果是一模一样的，但是你的目标是想要神经元学习到不同的特征，那样你才能获得比较合适的结果。

初始化我们可以采用很小（权值初始化很小）的随机数。---效果不好，梯度为0，得不到更新。

如果初始化为1（权值初始化太大）那么整个网络会饱和

[展开全文]

ivy1233 · 2018-06-05 · 6.1 激活函数 0

[展开全文]

SanfordHsu · 2018-02-24 · 6.1 激活函数 0

6.1 激活函数

Mini-batch SGD

最小批量随机梯度下降

1、对数据进行连续的批量抽样

2、使用神经网络将数据正向传播，得到损失值

3、通过整个网络的反向传播计算梯度

4、使用该梯度更新网络中的参数或者权重

第一部分

- 激活函数

- 数据预处理

- 权重初始化

- 批量归一化

- 训练过程的监控

- 超参数优化

ReLU

梯度消失

[展开全文]

比尤莱•吉布 · 2018-02-05 · 6.1 激活函数 0

【中文字幕】2017春季CS231n 斯坦福深度视觉识别课

相关课程

授课教师

最新学员

学员动态