深度学习-——深度可分离卷积

深度可分离卷积

通过引入深度可分离卷积，能很好的减少参数量，但效果下降更少，用于将模型降小。

第一步 depthwise feature：一般卷积网络会在卷积后三个通道求和，但深度可分离卷积不会求和，三个通道分别继续卷积：

第二步 pointwise feature：使用一个1x1的卷积核，将多通道求和：

对比计算量

模型结构

◆ 普通卷积计算量
Dk. Dk.M .N . DF. DF (各项相乘)
加法操作，复杂度小，忽略掉
其中：DKxDK 是输入和卷积核相乘， DF*DF 是滑动次数，为什么乘以 M？ M 是通道数目， N 是卷积核的数目(可以理解 M 是输入通道数目， N 是输出通道数目)

◆ 深度可分离卷积计算量
◆深度可分离
DK.DK.M.DF.DF depthwise 计算量
◆1*1 卷积
M.N.DF.DF pointwise 计算量

1/卷积核数目+1/卷积核 size 的平方

https://www.cnblogs.com/hellcat/p/9726528.html

参数量减少比例：Dk x Dk x M+M x N / Dk x Dk x M x N

代码：

class DepthWiseConv2d(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, bias=True):
        super(DepthWiseConv2d, self).__init__()#这里写为super().__init__()，等价的
        self.depthwise_conv = nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, groups=in_channels, bias=False) #groups参数表示一个卷积核的每个通道分别进行运算
        self.pointwise_conv = nn.Conv2d(in_channels, out_channels, 1, 1, 0, bias=bias)
    
    def forward(self, x):
        x = self.depthwise_conv(x)
        x = self.pointwise_conv(x)
        return x

需要使用groups参数表示一个卷积核的每个通道分别进行运算，否则会直接卷积求和。

pointwise_conv：使用是1x1的卷积核


class CNN(nn.Module):
    def __init__(self, activation="relu"):
        super(CNN, self).__init__()
        self.activation = F.relu if activation == "relu" else F.selu
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding="same")
        self.conv2 = DepthWiseConv2d(in_channels=32, out_channels=32, kernel_size=3, padding="same")
        self.pool = nn.MaxPool2d(2, 2)
        self.conv3 = DepthWiseConv2d(in_channels=32, out_channels=64, kernel_size=3, padding="same")
        self.conv4 = DepthWiseConv2d(in_channels=64, out_channels=64, kernel_size=3, padding="same")
        self.conv5 = DepthWiseConv2d(in_channels=64, out_channels=128, kernel_size=3, padding="same")
        self.conv6 = DepthWiseConv2d(in_channels=128, out_channels=128, kernel_size=3, padding="same")
        self.flatten = nn.Flatten()
        # input shape is (28, 28, 1) so the fc1 layer in_features is 128 * 3 * 3
        self.fc1 = nn.Linear(128 * 3 * 3, 128)
        self.fc2 = nn.Linear(128, 10)
        
        self.init_weights()
        
    def init_weights(self):
        """使用 xavier 均匀分布来初始化全连接层、卷积层的权重 W"""
        for m in self.modules():
            if isinstance(m, (nn.Linear, nn.Conv2d)):
                nn.init.xavier_uniform_(m.weight)
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
        
    def forward(self, x):
        act = self.activation
        x = self.pool(act(self.conv2(act(self.conv1(x)))))#(batch_size，32，14，14)
        x = self.pool(act(self.conv4(act(self.conv3(x)))))#(batch_size，64，7,7)
        x = self.pool(act(self.conv6(act(self.conv5(x)))))#(batch_size，128，3，3)
        x = self.flatten(x)#(batch_size，128，3，3)
        x = act(self.fc1(x))#(batch_size，128)
        x = self.fc2(x)
        return x
    

for idx, (key, value) in enumerate(CNN().named_parameters()):
    print(f"{key}\tparamerters num: {np.prod(value.shape)}")