JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]


Journal of Information Science and Engineering, Vol. 39 No. 6, pp. 1403-1420


Multi-Person Pose Estimation using an Ordinal Depth-Guided Convolutional Neural Network


YI-YUAN CHEN1, KUOCHEN WANG1,2,+, HAO-WEI CHUNG3,4, CHIEN-CHIH CHEN5,
BOHAU HUANG1 AND I-WEI LU2
1Department of Computer Science
3Department of Biological Science and Technology
National Yang Ming Chiao Tung University
Hsinchu, 300 Taiwan

2Center for Fundamental Science
Kaohsiung Medical University
Kaohsiung, 807 Taiwan

4Department of Pediatrics
Kaohsiung Medical University Hospital
Kaohsiung, 807 Taiwan

5Department of Industrial and Information Management
National Cheng Kung University
Tainan, 701 Taiwan


Monocular 2D multi-person pose estimation in videos is essential for applications such as surveillance, action recognition, kinematics analysis, and medical diagnosis. Existing state-of-the-art offsets-based methods extract temporal features from offsets in consecutive predicted rough skeletons for better preciseness in fine-tuned the skeletons. How-ever, the precision of existing single image-based models of rough skeleton prediction, such as HRNet, dropped due to shifting of target persons in propagated bounding boxes and resulted in inconsistent estimated poses in consecutive frames. To conquer this prob-lem, we proposed an Ordinal Depth-Guided-Convolutional Neural Network (ODG-CNN) to address the issue. The proposed ordinal depth guides the Ordinal Depth-Guided Block (ODGB) in the ODG-CNN to reweight features for target persons in bounding boxes. Experiment results on the PoseTrack 2018 dataset indicate that the proposed ODG-CNN achieves the highest performance in terms of mean Average Precision (mAP). The proposed ODG-CNN is suited for applications, such as use of telehealth for early detection and intervention of developmental delays in children, which needs high accuracy of video-based estimated poses.


Keywords: convolutional neural network (CNN), human pose estimation (HPE), multi-person pose estimation, ordinal depth, video-based HPE

  Retrieve PDF document (JISE_202306_12.pdf)