tensorflow-model-summary

tensorflow-model-summary

Charles Lv7

CNN+LSTM

该代码实现了一个由四个卷积层、两个LSTM层和一个ResNet模块组成的神经网络。首先使用四个卷积层提取图像特征,然后将残差块的输出通过LSTM层进行处理。将LSTM层的输出通过ResNet模块进行处理,最后再次使用LSTM层输出最终结果。

four layers of Conv2d + one layer of LSTM

conv2d_layer函数实现了卷积层

1
2
3
def conv2d_layer(inputs, filters, kernel_size, strides, padding):
conv = tf.layers.conv2d(inputs=inputs, filters=filters, kernel_size=kernel_size, strides=strides, padding=padding, activation=tf.nn.relu)
return conv

residual_block函数实现了残差块

1
2
3
4
5
def residual_block(inputs, filters, kernel_size, strides, padding):
conv1 = conv2d_layer(inputs, filters, kernel_size, strides, padding)
conv2 = conv2d_layer(conv1, filters, kernel_size, strides, padding)
residual_connection = tf.add(conv2, inputs)
return residual_connection

lstm_layer函数实现了LSTM层

1
2
3
4
5
def lstm_layer(inputs, hidden_size, num_layers):
cells = [tf.nn.rnn_cell.LSTMCell(hidden_size) for _ in range(num_layers)]
stacked_lstm = tf.nn.rnn_cell.MultiRNNCell(cells)
outputs, state = tf.nn.dynamic_rnn(stacked_lstm, inputs, dtype=tf.float32)
return outputs

完整实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# 输入数据
inputs = tf.placeholder(tf.float32, shape=[None, 28, 28, 1])
# 第一层卷积层
conv1 = conv2d_layer(inputs, 32, 3, 1, 'same')
conv2 = conv2d_layer(conv1, 32, 3, 1, 'same')
residual1 = residual_block(conv2, 32, 3, 1, 'same')
pool1 = tf.layers.max_pooling2d(inputs=residual1, pool_size=2, strides=2)
# 第二层卷积层
conv3 = conv2d_layer(pool1, 64, 3, 1, 'same')
conv4 = conv2d_layer(conv3, 64, 3, 1, 'same')
residual2 = residual_block(conv4, 64, 3, 1, 'same')
pool2 = tf.layers.max_pooling2d(inputs=residual2, pool_size=2, strides=2)
# 第三层卷积层
conv5 = conv2d_layer(pool2, 128, 3, 1, 'same')
conv6 = conv2d_layer(conv5, 128, 3, 1, 'same')
residual3 = residual_block(conv6, 128, 3, 1, 'same')
pool3 = tf.layers.max_pooling2d(inputs=residual3, pool_size=2, strides=2)
# 第四层卷积层
conv7 = conv2d_layer(pool3, 256, 3, 1, 'same')
conv8 = conv2d_layer(conv7, 256, 3, 1, 'same')
residual4 = residual_block(conv8, 256, 3, 1, 'same')
# 将残差块的输出转换为LSTM的输入
lstm_input = tf.reshape(residual4, [-1, 4, 4*256])
lstm_output = lstm_layer(lstm_input, 512, 1)

ResNet + one layer of LSTM

完整实现

1
2
3
4
5
6
# 将LSTM的输出转换为ResNet的输入
resnet_input = tf.reshape(lstm_output, [-1, 4, 4, 512])
residual5 = residual_block(resnet_input, 512, 3, 1, 'same')
# 再次使用LSTM层处理ResNet的输出
lstm_input2 = tf.reshape(residual5, [-1, 4, 4*512])
lstm_output2 = lstm_layer(lstm_input2, 512, 1)

3D CNN

该代码段实现了一个由三个卷积层、一个3D ResNet模块和一个ResNet (2+1)D模块组成的神经网络。分别对应了Isolated Sign Language Recognition在github上的三种实现方法。

three layers of Conv3d

conv3d_layer函数实现了3D卷积层

1
2
3
4
def conv3d_layer(inputs, filters, kernel_size, strides, padding):
conv = tf.layers.conv3d(inputs=inputs, filters=filters, kernel_size=kernel_size,
strides=strides, padding=padding, activation=tf.nn.relu)
return conv

residual_3d_block函数实现了3D残差块

1
2
3
4
5
def residual_3d_block(inputs, filters, kernel_size, strides, padding):
conv1 = conv3d_layer(inputs, filters, kernel_size, strides, padding)
conv2 = conv3d_layer(conv1, filters, kernel_size, strides, padding)
residual_connection = tf.add(conv2, inputs)
return residual_connection

residual_2plus1d_block函数实现了(2+1)D残差块

1
2
3
4
5
6
7
8
9
def residual_2plus1d_block(inputs, filters, kernel_size, strides, padding):
conv1 = tf.layers.conv2d(inputs=inputs, filters=filters, kernel_size=(
1, kernel_size, kernel_size), strides=(1, strides, strides), padding='same', activation=tf.nn.relu)
conv2 = tf.layers.conv2d(inputs=conv1, filters=filters, kernel_size=(
kernel_size, 1, 1), strides=(strides, 1, 1), padding='same', activation=tf.nn.relu)
conv3 = tf.layers.conv2d(inputs=conv2, filters=filters, kernel_size=(
1, 1, 1), strides=(1, 1, 1), padding='same', activation=None)
residual_connection = tf.add(conv3, inputs)
return residual_connection

3D ResNet

three_layers_3d_resnet函数实现了3D ResNet模块

1
2
3
4
5
6
7
8
9
10
11
12
def three_layers_3d_resnet(inputs):
conv1 = conv3d_layer(inputs, 64, 3, 1, 'same')
conv2 = conv3d_layer(conv1, 128, 3, 2, 'same')
residual1 = residual_3d_block(conv2, 128, 3, 1, 'same')
residual2 = residual_3d_block(residual1, 128, 3, 1, 'same')
conv3 = conv3d_layer(residual2, 256, 3, 2, 'same')
residual3 = residual_3d_block(conv3, 256, 3, 1, 'same')
residual4 = residual_3d_block(residual3, 256, 3, 1, 'same')
conv4 = conv3d_layer(residual4, 512, 3, 2, 'same')
residual5 = residual_3d_block(conv4, 512, 3, 1, 'same')
residual6 = residual_3d_block(residual5, 512, 3, 1, 'same')
return residual6

ResNet (2+1)D

three_layers_2plus1d_resnet函数实现了ResNet (2+1)D模块

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def three_layers_2plus1d_resnet(inputs):
conv1 = tf.layers.conv2d(inputs=inputs, filters=64, kernel_size=(
1, 7, 7), strides=(1, 2, 2), padding='same', activation=tf.nn.relu)
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=(
1, 3, 3), strides=(1, 2, 2), padding='same')
residual1 = residual_2plus1d_block(pool1, 64, 3, 1, 'same')
residual2 = residual_2plus1d_block(residual1, 64, 3, 1, 'same')
conv2 = tf.layers.conv2d(inputs=residual2, filters=128, kernel_size=(
1, 3, 3), strides=(1, 2, 2), padding='same', activation=tf.nn.relu)
residual3 = residual_2plus1d_block(conv2, 128, 3, 1, 'same')
residual4 = residual_2plus1d_block(residual3, 128, 3, 1, 'same')
conv3 = tf.layers.conv2d(inputs=residual4, filters=256, kernel_size=(
1, 3, 3), strides=(1, 2, 2), padding='same', activation=tf.nn.relu)
residual5 = residual_2plus1d_block(conv3, 256, 3, 1, 'same')
residual6 = residual_2plus1d_block(residual5, 256, 3, 1, 'same')
return residual6

three_layers_3d_resnet_2plus1d函数将两个模块组合在一起。

1
2
3
4
5
def three_layers_3d_resnet_2plus1d(inputs):
three_layers_3d_resnet_output = three_layers_3d_resnet(inputs)
three_layers_2plus1d_resnet_output = three_layers_2plus1d_resnet(
three_layers_3d_resnet_output)
return three_layers_2plus1d_resnet_output

定义输出层和损失函数及优化器

1
2
3
4
5
6
7
# 输出层
flatten = tf.reshape(lstm_output2[:, -1, :], [-1, 512])
logits = tf.layers.dense(flatten, 10)
# 定义损失函数和优化器
labels = tf.placeholder(tf.int32, shape=[None])
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits))
optimizer = tf.train.AdamOptimizer(1e-4).minimize(loss)

具体工作

在github该项目检索发现ResNet(2+1)D神经网络识别率准确率均最高,所以对这个网络类型展开调研,具体实现函数即上面所写的three_layers_2plus1d_resnet函数,内容如下:

主要内容

ResNet (2+1)D神经网络是一种用于视频分类的卷积神经网络,它是ResNet在时间维度上的扩展。

ResNet (2+1)D网络的基本思想是在2D卷积和1D卷积之间交替进行,以分别处理空间和时间维度上的特征。在这种网络结构中,每个卷积层都包括一个2D卷积和一个1D卷积,其中2D卷积用于处理空间维度上的特征,1D卷积用于处理时间维度上的特征。这种结构使得网络能够同时考虑空间和时间维度上的特征,从而更好地处理视频数据。

此外,ResNet (2+1)D网络也采用了ResNet的残差连接思想,在每个2D卷积层和1D卷积层之间添加了残差连接,使得网络能够更深,并且可以避免梯度消失问题。 ResNet (2+1)D网络在视频分类等任务上取得了很好的结果,尤其是在UCF101和HMDB51等数据集上的表现优异。该网络的训练和使用也非常简单,只需对视频进行帧采样和数据增强,然后将其输入到网络中进行训练即可。

在项目中的对应

ResNet(2+1)D神经网络依靠CSL_Isolated_Conv3D这个程序实现,类定义为r2plus1d_18函数(调用对应文件Conv3D.py的519行,实现对应文件Conv3D.py的480行

在项目中这部分直接调用torchvision.models.video.r2plus1d_18函数,看不到底层实现。(所以应该可以调用上面tenserflow写出来的实现函数)

  • Title: tensorflow-model-summary
  • Author: Charles
  • Created at : 2023-03-21 13:36:47
  • Updated at : 2023-07-27 16:45:54
  • Link: https://charles2530.github.io/2023/03/21/tensorflow-model-summary/
  • License: This work is licensed under CC BY-NC-SA 4.0.
Comments