Hierarchical long short-term memory for action recognition based on 3D skeleton joints from Kinect sensor
Abstract
Action recognition has been used in a wide range of applications such as human-computer interaction, intelligent video surveillance systems, video summarization, and robotics. Recognizing action is important for intelligent agents to understand, learn and interact with the environment. The recent technology that allows the acquisition of RGB+D and 3D skeleton data and a deep learning model's development significantly increases the action recognition model's performance. In this research, hierarchical Long Sort-Term Memory is proposed to recognize action based on 3D skeleton joints from Kinect sensor. The model uses the 3D axis of skeleton joints and groups each joint in the axis into parts, namely, spine, left and right arm, left and right hand, and left and right leg. To fit the hierarchically structured layers of LSTM, the parts are concatenated into spine, arms, hands, and legs and then concatenated into the body. The model crosses the body in each axis into a single final body and fed to the final layer to classify the action. The performance is measured using cross-view and cross-subject evaluation and achieves accuracy 0.854 and 0.837, respectively, from the 10 action classes of the NTU RGB+D dataset.References
T. Huang, "Computer vision: Evolution and promise," 1996. Available at: Google Scholar
D. Wu, N. Sharma, and M. Blumenstein, "Recent advances in video-based human action recognition using deep learning: A review," in Proceedings of the International Joint Conference on Neural Networks, 2017, vol. 2017-May, pp. 2865–2872, doi: 10.1109/IJCNN.2017.7966210.
V. Krüger, D. Kragic, A. Ude, and C. Geib, "The meaning of action: A review on action recognition and mapping," Adv. Robot., vol. 21, no. 13, pp. 1473–1501, 2007, doi: 10.1163/156855307782148578.
C. Chen, R. Jafari, and N. Kehtarnavaz, "A survey of depth and inertial sensor fusion for human action recognition," Multimed. Tools Appl., vol. 76, no. 3, pp. 4405–4425, Feb. 2017, doi: 10.1007/s11042-015-3177-1.
R. Vemulapalli, F. Arrate, and R. Chellappa, "Human action recognition by representing 3D skeletons as points in a lie group," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2014, pp. 588–595, doi: 10.1109/CVPR.2014.82.
Yong Du, W. Wang, and L. Wang, "Hierarchical recurrent neural network for skeleton based action recognition," in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1110–1118, doi: 10.1109/CVPR.2015.7298714.
H. Wang, W. Wang, and L. Wang, "Hierarchical motion evolution for action recognition," in 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), 2015, pp. 574–578, doi: 10.1109/ACPR.2015.7486568.
J. Liu, A. Shahroudy, D. Xu, and G. Wang, "Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition," Springer, Cham, 2016, pp. 816–833. doi: 10.1007/978-3-319-46487-9_50
Y. Wang, S. Wang, J. Tang, N. O'Hare, Y. Chang, and B. Li, "Hierarchical Attention Network for Action Recognition in Videos," Jul. 2016. Available at: Google Scholar
X. Chai, Z. Liu, F. Yin, Z. Liu, and X. Chen, "Two streams Recurrent Neural Networks for Large-Scale Continuous Gesture Recognition," in 2016 23rd International Conference on Pattern Recognition (ICPR), 2016, pp. 31–36, doi: 10.1109/ICPR.2016.7899603.
Y. Du, Y. Fu, and L. Wang, "Representation Learning of Temporal Dynamics for Skeleton-Based Action Recognition," IEEE Trans. Image Process., vol. 25, no. 7, pp. 3010–3022, Jul. 2016, doi: 10.1109/TIP.2016.2552404.
B. Mahasseni and S. Todorovic, "Regularizing Long Short Term Memory with 3D Human-Skeleton Sequences for Action Recognition," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3054–3062, doi: 10.1109/CVPR.2016.333.
W. Zhu et al., "Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks." 2016. Available at: Google Scholar
R. Vemulapalli and R. Chellappa, "Rolling rotations for recognizing human actions from 3D skeletal data," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016, vol. 2016-Decem, pp. 4471–4479, doi: 10.1109/CVPR.2016.484.
Q. Ke, M. Bennamoun, S. An, F. Sohel, and F. Boussaid, "A New Representation of Skeleton Sequences for 3D Action Recognition," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4570–4579, doi: 10.1109/CVPR.2017.486.
S. Zhang, X. Liu, and J. Xiao, "On Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks," in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), 2017, pp. 148–157, doi: 10.1109/WACV.2017.24.
T. Tsunoda, Y. Komori, M. Matsugu, and T. Harada, "Football Action Recognition Using Hierarchical LSTM," in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 155–163, doi: 10.1109/CVPRW.2017.25.
G. Zhu, L. Zhang, P. Shen, and J. Song, "Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTM," IEEE Access, vol. 5, pp. 4517–4524, 2017, doi: 10.1109/ACCESS.2017.2684186.
I. Lee, D. Kim, S. Kang, and S. Lee, "Ensemble Deep Learning for Skeleton-Based Action Recognition Using Temporal Sliding LSTM Networks," in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1012–1020, doi: 10.1109/ICCV.2017.115.
Chuankun Li, Pichao Wang, Shuang Wang, Yonghong Hou, and Wanqing Li, "Skeleton-based action recognition using LSTM and CNN," in 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2017, pp. 585–590, doi: 10.1109/ICMEW.2017.8026287.
S. Wei, Y. Song, and Y. Zhang, "Human skeleton tree recurrent neural network with joint relative motion feature for skeleton based action recognition," in 2017 IEEE International Conference on Image Processing (ICIP), 2017, pp. 91–95, doi: 10.1109/ICIP.2017.8296249.
P. Shukla, K. K. Biswas, and P. K. Kalra, "Recurrent Neural Network Based Action Recognition from 3D Skeleton Data," in 2017 13th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), 2017, pp. 339–345, doi: 10.1109/SITIS.2017.63.
W. Li, L. Wen, M.-C. Chang, S. N. Lim, and S. Lyu, "Adaptive RNN Tree for Large-Scale Human Action Recognition," in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1453–1461, doi: 10.1109/ICCV.2017.161.
H. Gammulle, S. Denman, S. Sridharan, and C. Fookes, "Two Stream LSTM: A Deep Fusion Framework for Human Action Recognition," in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), 2017, pp. 177–186, doi: 10.1109/WACV.2017.27.
J. Liu, G. Wang, P. Hu, L.-Y. Duan, and A. C. Kot, "Global Context-Aware Attention LSTM Networks for 3D Action Recognition," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3671–3680, doi: 10.1109/CVPR.2017.391.
D. Xu, X. Xiao, X. Wang, and J. Wang, "Human action recognition based on Kinect and PSO-SVM by representing 3D skeletons as points in lie group," in ICALIP 2016 - 2016 International Conference on Audio, Language and Image Processing - Proceedings, 2017, pp. 568–573, doi: 10.1109/ICALIP.2016.7846646.
W. Ding, K. Liu, B. Xu, and F. Cheng, "Skeleton-based human action recognition via screw matrices," Chinese J. Electron., vol. 26, no. 4, pp. 790–796, Jul. 2017, doi: 10.1049/cje.2017.06.012.
J. Liu, A. Shahroudy, D. Xu, A. C. Kot, and G. Wang, "Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates," IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 12, pp. 3007–3021, Dec. 2018, doi: 10.1109/TPAMI.2017.2771306.
S. Yan, J. S. Smith, W. Lu, and B. Zhang, "Hierarchical Multi-scale Attention Networks for action recognition," Signal Process. Image Commun., vol. 61, pp. 73–84, Feb. 2018, doi: 10.1016/J.IMAGE.2017.11.005.
A. Ullah, J. Ahmad, K. Muhammad, M. Sajjad, and S. W. Baik, "Action Recognition in Video Sequences using Deep Bi-Directional LSTM With CNN Features," IEEE Access, vol. 6, pp. 1155–1166, 2018, doi: 10.1109/ACCESS.2017.2778011.
X. Liu, Y. Li, and Q. Wang, "Multi-View Hierarchical Bidirectional Recurrent Neural Network for Depth Video Sequence Based Action Recognition," Int. J. Pattern Recognit. Artif. Intell., vol. 32, no. 10, p. 1850033, Oct. 2018, doi: 10.1142/S0218001418500337.
C. Li, Q. Zhong, D. Xie, and S. Pu, "Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation," Apr. 2018. doi: 10.24963/ijcai.2018/109
L. Wang, Y. Xu, J. Cheng, H. Xia, J. Yin, and J. Wu, "Human Action Recognition by Learning Spatio-Temporal Features With Deep Neural Networks," IEEE Access, vol. 6, pp. 17913–17922, 2018, doi: 10.1109/ACCESS.2018.2817253.
Y. Han, S. L. Chung, A. Ambikapathi, J. S. Chan, W. Y. Lin, and S. F. Su, "Robust Human Action Recognition Using Global Spatial-Temporal Attention for Human Skeleton Data," in Proceedings of the International Joint Conference on Neural Networks, 2018, vol. 2018-July, doi: 10.1109/IJCNN.2018.8489386.
G. Yao, T. Lei, and J. Zhong, "A review of Convolutional-Neural-Network-based action recognition," Pattern Recognit. Lett., vol. 118, pp. 14–22, Feb. 2019, doi: 10.1016/j.patrec.2018.05.018.
A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, "NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis," in IEEE Conference on Computer Vision and Pattern Recognition, 2016. doi: 10.1109/CVPR.2016.115
J. Liu, A. Shahroudy, M. Perez, G. Wang, L.-Y. Duan, and A. C. Kot, "NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding," IEEE Trans. Pattern Anal. Mach. Intell., 2019, doi: 10.1109/TPAMI.2019.2916873.
J. Chung, S. Ahn, and Y. Bengio, "Hierarchical Multiscale Recurrent Neural Networks," Sep. 2016. Available at: Google Scholar
Downloads
Published
Issue
Section
License
Authors who publish with Jurnal Informatika (JIFO) agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.