In this blog post, we will go through several classic CNN structures that builds the backbones of Computer Vision.

## Environment

• NVIDIA GeForce GTX 1080Ti 12GiB * 1

## LeNet

First appeared in Gradient-based learning applied to document recognition

### Structure

channel 在深度学习的算法学习中，都会提到 channels 这个概念。在一般的深度学习框架的 conv2d 中，如 tensorflow 、mxnet ，channels 都是必填的一个参数。

channels 该如何理解？

net = nn.Sequential(
nn.AvgPool2d(kernel_size=2,stride=2),#28*28->14*14
nn.Conv2d(6,16,kernel_size=5,),nn.Sigmoid(),#14*14->10*10
nn.AvgPool2d(kernel_size=2,stride=2),#10*10->5*5
nn.Flatten(),
nn.Linear(16 * 5 * 5,120),nn.Sigmoid(),
nn.Linear(120,84),nn.Sigmoid(),
nn.Linear(84,10)
)


(colanora_conda_env) [colanora@colanora learning]$python -u "/home/colanora/learning/lenet.py" training on cuda:0 /home/colanora/learning/lenet.py:28: DeprecationWarning: set_matplotlib_formats is deprecated since IPython 7.23, directly use matplotlib_inline.backend_inline.set_matplotlib_formats() display.set_matplotlib_formats('svg') <Figure size 350x250 with 1 Axes> <Figure size 350x250 with 1 Axes> <Figure size 350x250 with 1 Axes> ... loss 0.482, train acc 0.817, test acc 0.791 48381.2 examples/sec on cuda:0  ## AlexNet ### Structure Left: LeNet, Right: AlexNet alexnet = nn.Sequential( nn.Conv2d(1, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2), nn.Conv2d(96, 256, kernel_size=5, padding=2), nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2), nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(), nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(), nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2), nn.Flatten(), nn.Linear(6400, 4096), nn.ReLU(), nn.Dropout(), nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(p=0.5), nn.Linear(4096, 10), )  GPU 上训练结果： (colanora_conda_env) [colanora@colanora learning]$ python -u "/home/colanora/learning/lenet.py"
training on cuda:0
/home/colanora/learning/lenet.py:28: DeprecationWarning: set_matplotlib_formats is deprecated since IPython 7.23, directly use matplotlib_inline.backend_inline.set_matplotlib_formats()
display.set_matplotlib_formats("svg")
<Figure size 350x250 with 1 Axes>
<Figure size 350x250 with 1 Axes>
<Figure size 350x250 with 1 Axes>
...
loss 0.323, train acc 0.881, test acc 0.884
1503.4 examples/sec on cuda:0


## NIN

### Code

def nin_block(in_channels, out_channels, kernel_size, strides, padding):
return nn.Sequential(
nn.ReLU(),
nn.Conv2d(out_channels, out_channels, kernel_size=1),
nn.ReLU(),
nn.Conv2d(out_channels, out_channels, kernel_size=1),
nn.ReLU(),
)

nin_net = nn.Sequential(
nn.MaxPool2d(3, stride=2),
nn.MaxPool2d(3, stride=2),
nn.MaxPool2d(3, stride=2),
nn.Dropout(0.5),
nn.Flatten(),
)


### Train on GPU

(colanora_conda_env) [colanora@colanora learning]$python -u "/home/colanora/learning/lenet.py" training on cuda:0 /home/colanora/learning/lenet.py:28: DeprecationWarning: set_matplotlib_formats is deprecated since IPython 7.23, directly use matplotlib_inline.backend_inline.set_matplotlib_formats() display.set_matplotlib_formats("svg") <Figure size 350x250 with 1 Axes> <Figure size 350x250 with 1 Axes> <Figure size 350x250 with 1 Axes> ... loss 0.491, train acc 0.819, test acc 0.804 1374.1 examples/sec on cuda:0  ## Inception-Net ### Structure #### inception block #### network structure ### Code # inception-net class inception_block(nn.Module): def __init__(self, in_channels, c1, c2, c3, c4, **kwargs): super(inception_block, self).__init__(**kwargs) self.p1_1 = nn.Conv2d(in_channels, c1, kernel_size=1) self.p2_1 = nn.Conv2d(in_channels, c2[0], kernel_size=1) self.p2_2 = nn.Conv2d(c2[0], c2[1], kernel_size=3, padding=1) self.p3_1 = nn.Conv2d(in_channels, c3[0], kernel_size=1) self.p3_2 = nn.Conv2d(c3[0], c3[1], kernel_size=5, padding=2) # 线路4，3x3最大汇聚层后接1x1卷积层 self.p4_1 = nn.MaxPool2d(kernel_size=3, stride=1, padding=1) self.p4_2 = nn.Conv2d(in_channels, c4, kernel_size=1) def forward(self, x): p1 = torch.nn.functional.relu(self.p1_1(x)) p2 = torch.nn.functional.relu(self.p2_2(torch.nn.functional.relu(self.p2_1(x)))) p3 = torch.nn.functional.relu(self.p3_2(torch.nn.functional.relu(self.p3_1(x)))) p4 = torch.nn.functional.relu(self.p4_2(self.p4_1(x))) # 在通道维度上连结输出 return torch.cat((p1, p2, p3, p4), dim=1) b1 = nn.Sequential( nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3), nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2, padding=1) ) b2 = nn.Sequential( nn.Conv2d(64, 64, kernel_size=1), nn.ReLU(), nn.Conv2d(64, 192, kernel_size=3, padding=1), nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2, padding=1), ) b3 = nn.Sequential( inception_block(192, 64, (96, 128), (16, 32), 32), inception_block(256, 128, (128, 192), (32, 96), 64), nn.MaxPool2d(kernel_size=3, stride=2, padding=1), ) b4 = nn.Sequential( inception_block(480, 192, (96, 208), (16, 48), 64), inception_block(512, 160, (112, 224), (24, 64), 64), inception_block(512, 128, (128, 256), (24, 64), 64), inception_block(512, 112, (144, 288), (32, 64), 64), inception_block(528, 256, (160, 320), (32, 128), 128), nn.MaxPool2d(kernel_size=3, stride=2, padding=1), ) b5 = nn.Sequential( inception_block(832, 256, (160, 320), (32, 128), 128), inception_block(832, 384, (192, 384), (48, 128), 128), nn.AdaptiveAvgPool2d((1, 1)), nn.Flatten(), ) inception_net = nn.Sequential(b1, b2, b3, b4, b5, nn.Linear(1024, 10))  ### Train on GPU (colanora_conda_env) [colanora@colanora learning]$ python -u "/home/colanora/learning/lenet.py"
training on cuda:0
/home/colanora/learning/lenet.py:28: DeprecationWarning: set_matplotlib_formats is deprecated since IPython 7.23, directly use matplotlib_inline.backend_inline.set_matplotlib_formats()
display.set_matplotlib_formats("svg")
loss 0.240, train acc 0.908, test acc 0.896
1669.3 examples/sec on cuda:0


## ResNet

### Train on GPU

batch_size = 256
resize = 96
lr, num_epochs = 0.1, 10

(colanora_conda_env) [colanora@colanora learning]$python -u "/home/colanora/learning/lenet.py" training on cuda:0 /home/colanora/learning/lenet.py:28: DeprecationWarning: set_matplotlib_formats is deprecated since IPython 7.23, directly use matplotlib_inline.backend_inline.set_matplotlib_formats() display.set_matplotlib_formats("svg") loss 0.012, train acc 0.997, test acc 0.906 2215.3 examples/sec on cuda:0  感觉好像参数环境啥的忘写了，等我有空补一下 开学人就是这么卑微 ## DenseNet • ResNet将整个拟合函数分为（或者说展开）为两部分：一个简单的线性项和一个复杂的非线性项。 (f(\mathbf{x}) = \mathbf{x} + g(\mathbf{x}). • DenseNet 更进一步，用连接 将函数分解成一个展开式： (\mathbf{x} \to \left[ \mathbf{x}, f_1(\mathbf{x}), f_2([\mathbf{x}, f_1(\mathbf{x})]), f_3([\mathbf{x}, f_1(\mathbf{x}), f_2([\mathbf{x}, f_1(\mathbf{x})])]), \ldots\right].) 这些展开式用多层展开机连接，实现起来就是用全连接连起来就行了。 稠密网络主要由2部分构成：稠密块（dense block）和过渡层（transition layer）。 前者定义如何连接输入和输出，而后者则控制通道数量，使其不会太复杂。 ### Code # DenseNet def conv_block(input_channels, num_channels): return nn.Sequential( nn.BatchNorm2d(input_channels), nn.ReLU(), nn.Conv2d(input_channels, num_channels, kernel_size=3, padding=1) ) class DenseBlock(nn.Module): def __init__(self, num_convs, input_channels, num_channels): super(DenseBlock, self).__init__() layer = [] for i in range(num_convs): layer.append(conv_block(num_channels * i + input_channels, num_channels)) self.net = nn.Sequential(*layer) def forward(self, X): for blk in self.net: Y = blk(X) # 连接通道维度上每个块的输入和输出 X = torch.cat((X, Y), dim=1) return X blk = DenseBlock(2, 3, 10) X = torch.randn(4, 3, 8, 8) Y = blk(X) print(Y.shape) # 由于每个稠密块都会带来通道数的增加，使用过多则会过于复杂化模型。 而过渡层可以用来控制模型复杂度。 它通过 1×1 卷积层来减小通道数，并使用步幅为2的平均汇聚层减半高和宽，从而进一步降低模型复杂度。 def transition_block(input_channels, num_channels): return nn.Sequential( nn.BatchNorm2d(input_channels), nn.ReLU(), nn.Conv2d(input_channels, num_channels, kernel_size=1), nn.AvgPool2d(kernel_size=2, stride=2), ) blk = transition_block(23, 10) print(blk(Y).shape) # the same as resnet # b1 = nn.Sequential( # nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3), # nn.BatchNorm2d(64), # nn.ReLU(), # nn.MaxPool2d(kernel_size=3, stride=2, padding=1), # ) num_channels, growth_rate = 64, 32 num_convs_in_dense_blocks = [4, 4, 4, 4] blks = [] for i, num_convs in enumerate(num_convs_in_dense_blocks): blks.append(DenseBlock(num_convs, num_channels, growth_rate)) num_channels += num_convs * growth_rate if i != len(num_convs_in_dense_blocks) - 1: blks.append(transition_block(num_channels, num_channels // 2)) num_channels = num_channels // 2 densenet = nn.Sequential( b1, *blks, nn.BatchNorm2d(num_channels), nn.ReLU(), nn.AdaptiveMaxPool2d((1, 1)), nn.Flatten(), nn.Linear(num_channels, 10), )  ### Train on GPU 参数： batch_size = 256 resize = 96 lr, num_epochs = 0.1, 10  (colanora_conda_env) [colanora@colanora learning]$ python -u "/home/colanora/learning/lenet.py"
training on cuda:0
/home/colanora/learning/lenet.py:28: DeprecationWarning: set_matplotlib_formats is deprecated since IPython 7.23, directly use matplotlib_inline.backend_inline.set_matplotlib_formats()
display.set_matplotlib_formats("svg")
loss 0.147, train acc 0.947, test acc 0.910
2561.6 examples/sec on cuda:0


## APPENDIX

1.Backbone：翻译为骨干网络的意思，既然说是主干网络，就代表其是网络的一部分，那么是哪部分呢？这个主干网络大多时候指的是提取特征的网络，其作用就是提取图片中的信息，共后面的网络使用。这些网络经常使用的是resnet VGG等，而不是我们自己设计的网络，因为这些网络已经证明了在分类等问题上的特征提取能力是很强的。在用这些网络作为backbone的时候，都是直接加载官方已经训练好的模型参数，后面接着我们自己的网络。让网络的这两个部分同时进行训练，因为加载的backbone模型已经具有提取特征的能力了，在我们的训练过程中，会对他进行微调，使得其更适合于我们自己的任务。