深度可分离卷积

通过引入深度可分离卷积,能很好的减少参数量,但效果下降更少,用于将模型降小。

第一步 depthwise feature:一般卷积网络会在卷积后三个通道求和,但深度可分离卷积不会求和,三个通道分别继续卷积:

第二步 pointwise feature:使用一个1x1的卷积核,将多通道求和:

对比计算量

模型结构

◆ 普通卷积计算量
Dk. Dk.M .N . DF. DF (各项相乘)
加法操作, 复杂度小, 忽略掉
其中:DKxDK 是输入和卷积核相乘, DF*DF 是滑动次数, 为什么乘以 M? M 是通道数目, N 是卷积核的数目(可以理解 M 是输入通道数目, N 是输出通道数目)

◆ 深度可分离卷积计算量
◆深度可分离
DK.DK.M.DF.DF depthwise 计算量
◆1*1 卷积
M.N.DF.DF pointwise 计算量

1/卷积核数目+1/卷积核 size 的平方

https://www.cnblogs.com/hellcat/p/9726528.html

参数量减少比例 :Dk x Dk x M+M x N / Dk x Dk x M x N

代码:

1
2
3
4
5
6
7
8
9
10
class DepthWiseConv2d(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, bias=True):
super(DepthWiseConv2d, self).__init__()#这里写为super().__init__(),等价的
self.depthwise_conv = nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, groups=in_channels, bias=False) #groups参数表示一个卷积核的每个通道分别进行运算
self.pointwise_conv = nn.Conv2d(in_channels, out_channels, 1, 1, 0, bias=bias)

def forward(self, x):
x = self.depthwise_conv(x)
x = self.pointwise_conv(x)
return x

需要使用groups参数表示一个卷积核的每个通道分别进行运算,否则会直接卷积求和。

pointwise_conv:使用是1x1的卷积核

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

class CNN(nn.Module):
def __init__(self, activation="relu"):
super(CNN, self).__init__()
self.activation = F.relu if activation == "relu" else F.selu
self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding="same")
self.conv2 = DepthWiseConv2d(in_channels=32, out_channels=32, kernel_size=3, padding="same")
self.pool = nn.MaxPool2d(2, 2)
self.conv3 = DepthWiseConv2d(in_channels=32, out_channels=64, kernel_size=3, padding="same")
self.conv4 = DepthWiseConv2d(in_channels=64, out_channels=64, kernel_size=3, padding="same")
self.conv5 = DepthWiseConv2d(in_channels=64, out_channels=128, kernel_size=3, padding="same")
self.conv6 = DepthWiseConv2d(in_channels=128, out_channels=128, kernel_size=3, padding="same")
self.flatten = nn.Flatten()
# input shape is (28, 28, 1) so the fc1 layer in_features is 128 * 3 * 3
self.fc1 = nn.Linear(128 * 3 * 3, 128)
self.fc2 = nn.Linear(128, 10)

self.init_weights()

def init_weights(self):
"""使用 xavier 均匀分布来初始化全连接层、卷积层的权重 W"""
for m in self.modules():
if isinstance(m, (nn.Linear, nn.Conv2d)):
nn.init.xavier_uniform_(m.weight)
if m.bias is not None:
nn.init.zeros_(m.bias)

def forward(self, x):
act = self.activation
x = self.pool(act(self.conv2(act(self.conv1(x)))))#(batch_size,32,14,14)
x = self.pool(act(self.conv4(act(self.conv3(x)))))#(batch_size,64,7,7)
x = self.pool(act(self.conv6(act(self.conv5(x)))))#(batch_size,128,3,3)
x = self.flatten(x)#(batch_size,128,3,3)
x = act(self.fc1(x))#(batch_size,128)
x = self.fc2(x)
return x


for idx, (key, value) in enumerate(CNN().named_parameters()):
print(f"{key}\tparamerters num: {np.prod(value.shape)}")

收敛速度会比卷积网络更慢