JISE

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]

Journal of Information Science and Engineering, Vol. 35 No. 4, pp. 851-870

Trajectory-Based 3D Convolutional Descriptors for Human Action Recognition

SHEERAZ ARIF¹, JING WANG^1,+, FIDA HUSSAIN² AND ZESONG FEI¹
¹Department of Information and Communication Engineering
School of Information and Electronics
Beijing Institute of Technology
Beijing, 100081 P.R. China
E-mail: {Sheeraz.arif; wangjing⁺; feizesong}@bit.edu.cn
²School of Electrical and Information Engineering
Jiangsu University
Zhenjiang, 212013 P.R.China
E-mail: fidahussain@ujs.edu.cn

This article presents a new method for video representation, called trajectory based 3D convolutional descriptor (TCD), which incorporates the advantages of both deep learned features and hand-crafted features. We utilize deep architectures to learn discriminative convolutional feature maps, and conduct trajectory constrained pooling to aggregate these convolutional features into effective descriptors. Firstly, valid trajectories are generated by tracking the interest points within co-motion super-pixels. Secondly, we utilize the 3D ConvNet (C3D) to capture both motion and appearance information in the form of convolutional feature maps. Finally, feature maps are transformed by using two normalization methods, namely channel normalization and spatiotemporal normalization. Trajectory constrained sampling and pooling are used to aggregate deep learned features into descriptors. The proposed (TCD) contains high discriminative capacity compared with hand-crafted features and is able to boost the recognition performance. Experimental results on benchmark datasets demonstrate that our pipeline obtains superior performance over conventional algorithms in terms of both efficiency and accuracy.

Keywords: human action recognition (HAR), deep learning, trajectory feature, hybrid featured, super-pixel

Retrieve PDF document (JISE_201904_09.pdf)