Paper Reading - MobileNets series - V1 to V3

This is my reading note for MobileNets series.

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Abstract

introduce two simple global hyperparameters that efficiently trade off between latency and accuracy
These hyper-parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem.

Conclusion

proposed a new model architecture called MobileNets based on depthwise separable convolutions.
investigated some of the important design decisions leading to an efficient model.
demonstrated how to build smaller and faster MobileNets using width multiplier and resolution multiplier by trading off a reasonable amount of accuracy to reduce size and latency.
compared different MobileNets to popular models demonstrating superior size, speed and accuracy characteristics
concluded by demonstrating MobileNet’s effectiveness when applied to a wide variety of tasks.

Depthwise Separable Convolution

传统卷积的计算量：(D_FD_FD_KD_KM*N)。其中DF为特征图尺寸，DK为卷积核尺寸，M为输入通道数，N为输出通道数。

Depthwise convolution

卷积核拆分成单通道
对每一通道进行卷积操作

计算量 (D_FD_FD_KD_KM)

Pointwise convolution

用 1x1 的卷积核对输入特征图进行卷积操作

计算量 (D_FD_FM*N)

总计 $D_{K}\times D_{K}\times M\times D_{F}\times D_{F}+ M\times N\times D_{F}\times D_{F}$

比较：

$\frac{D_{K}\times D_{K}\times M\times D_{F}\times D_{F}+ M\times N\times D_{F}\times D_{F}}{D_{K}\times D_{K}\times M\times N\times D_{F}\times D_{F}}=\frac{1}{N}+\frac{1}{D_{K}^{2}}$

总参数量

Depthwise convolution 的卷积核尺寸是 Dk*Dk*M.

总计算量

卷积层

如果看源码的话可以发现这个 ReLU 层用的是 ReLU6:

# from https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet_v2.py
    activation_fn: Activation function to use, defaults to tf.nn.relu6 if not
      specified.

在移动设备上由于是 float16 的，ReLU激活范围不加限制的话输出范围太大，float16无法精确描述如此大范围数值，精度损失。
把boundary设置为6，则低精度也有很好的数值分辨率。

MobileNetV2: Inverted Residuals and Linear Bottlenecks

Abstract

inverted residual structure

Conclusions and future work

a very simple network architecture that allowed us to build a family of highly efficient mobile models.
the proposed convolutional block has a unique property that allows to separate the network expressiviness (encoded by expansion layers) from its capacity (encoded by bottleneck inputs). Exploring this is an important direction for future research.

V1 存在的问题

ReLU 6 造成信息丢失

把一个流形用 random matrix T 映射到 n 维空间后用 ReLU 处理，再用 T 的逆矩阵映射回来在低维度造成了信息丢失。高维度看起来其实是还可以的。
最后的那个 ReLU 被换成线性激活函数 -> Linear BottleNeck

如果输入通道较少，深度卷积只能工作在低维度，效果不好

先用 PW 卷积升维度，再在一个更高维度的空间中进行卷积操作。（Expansion Layer）

其他

ShortCut 结构

类似 ResNet 复用特征：

另外你也可以看到尾部的 RELU6 被换成了 Linear。

与 ResNet 相比，MobileNet V2 由 ResNet 的 0.25 降维变成了 6 倍升维，这样实际上 MobileNet 在网络中间的部分维度是比较大的。具象起来就比较像纺锤结构。而 Resnet 中间的维度比较小。这样想象一下就可以理解 MobileNet 为什么用 Inverted residuals 这个名字了。

V2 的 block：

网络结构：

Searching for MobileNetV3

Spotlights

NAS(hardware-aware network architecture search)
NetAdapt 算法
Good ideas from V1:
- Depthwise Separable Convolution
Good ideas from V2:
- resource-efficient block with inverted residuals and linear bottlenecks.
Squeeze-And-Excite
h-swish(x) in replace of ReLU6
ReLU6(x+3)/6 in simulation of sigmoid in SE module
change the head of MobileNetV2

Small 和 Large 的版本参数

SE denotes whether there is a Squeeze-And-Excite in that block.
NL denotes the type of nonlinearity used.
HS denotes h-swish and RE denotes ReLU.
NBN denotes no batch normalization. s denotes stride.

使用 Stride 进行降采样，不使用 pooling。

Efficient Mobile Building Blocks

the linear bottleneck and inverted residual structure(V1)
depthwise separable convolutions (V2)
lightweight attention modules based on squeeze and excitation into the bottleneck structure
hard sigmoid:
Sigmoid:
- inefficient to compute
- challenging to maintain accuracy in fixed point arithmetic
- we change it to hard-sigmoid.

class hswish(nn.Module):
    def forward(self, x):
        out = x * F.relu6(x + 3, inplace=True) / 6
        return out
        
class hsigmoid(nn.Module):
    def forward(self, x):
        out = F.relu6(x + 3, inplace=True) / 6
        return out

Squeeze and Excite

Network Improvements

Efficient last stage:

Share on

Twitter Facebook LinkedIn

Paper Reading - MobileNets series - V1 to V3

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Abstract

Conclusion

Depthwise Separable Convolution

Depthwise convolution

Pointwise convolution

总参数量

总计算量

卷积层

MobileNetV2: Inverted Residuals and Linear Bottlenecks

Abstract

Conclusions and future work

V1 存在的问题

ReLU 6 造成信息丢失

如果输入通道较少，深度卷积只能工作在低维度，效果不好

其他

ShortCut 结构

Searching for MobileNetV3

Spotlights

Efficient Mobile Building Blocks

Network Improvements

Share on

You may also enjoy

Paper Reading - [NeurIPS 2020] Denoising Diffusion Probabilistic Models

Paper Reading - [CVPR 2021] Taming Transformers for High-Resolution Image Synthesis

Training Results of some models against DROP dataset (AI2)

Paper Reading - [CVPR 2022] Learning to Answer Questions in Dynamic Audio-Visual Scenarios