Cover Image

New Hybrid Deep Learning Method to Recognize Human Action from Video

Md Shofiqul Islam, Sunjida Sultana, Md Jabbarul Islam

Abstract


There has been a tremendous increase in internet users and enough bandwidth in recent years. Because Internet connectivity is so inexpensive, information sharing (text, audio, and video) has become more popular and faster. This video content must be examined in order to classify it for different purposes for users. Several machine learning approaches for video classification have been developed to save users time and energy. The use of deep neural networks to recognize human behavior has become a popular issue in recent years. Although significant progress has been made in the field of video recognition, there are still numerous challenges in the realm of video to be overcome. Convolutional neural networks (CNNs) are well-known for requiring a fixed-size image input, which limits the network topology and reduces identification accuracy. Despite the fact that this problem has been solved in the world of photos, it has yet to be solved in the area of video. We present a ten stacked three-dimensional (3D) convolutional network based on the spatial pyramid-based pooling to handle the input problem of fixed size video frames in video recognition. The network structure is made up of three sections, as the name suggests: a ten-layer stacked 3DCNN, DenseNet, and SPPNet. A KTH dataset was used to test our algorithms. The experimental findings showed that our model outperformed existing models in the area of video-based behavior identification by 2% margin accuracy.

Keywords


Video Classification, 3D, Deep learning, Video, Video action, Convolution.

Full Text:

PDF

References


I. Khandokar, M. Hasan, F. Ernawan, S. Islam, and M. Kabir, "Handwritten character recognition using convolutional neural network," in Journal of Physics: Conference Series, IOP Publishing, vol. 1918, no. 4, p. 042152, 2021. https://doi.org/10.1088/1742-6596/1918/4/042152

M. S. Islam, S. Sultana, U. kumar Roy, and J. Al Mahmud, "A review on Video Classification with Methods, Findings, Performance, Challenges, Limitations and Future Work," Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), vol. 6, no. 2, pp. 47-57, 2020. https://doi.org/10.26555/jiteki.v6i2.18978

L. Fan, Z. Yin, H. Yu, and A. Gilliland, "Using Data-driven Analytics to Enhance Archival Processing of the COVID-19 Hate Speech Twitter Archive (CHSTA)," preprint, 2020. https://doi.org/10.31229/osf.io/gkydm

M. S. I. Shofiqul, N. Ab Ghani, and M. M. Ahmed, "A review on recent advances in Deep learning for Sentiment Analysis: Performances, Challenges and Limitations," COMPUSOFT: An International Journal of Advanced Computer Technology, vol. 9, no. 7, pp. 3768-3776, 2020. https://ijact.in/index.php/ijact/article/view/1175

M. S. Islam, S. Sultana, U. K. Roy, J. Al Mahmud, and S. Jahidul, "HARC-New Hybrid Method with Hierarchical Attention Based Bidirectional Recurrent Neural Network with Dilated Convolutional Neural Network to Recognize Multilabel Emotions from Text," Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), vol. 7, no. 1, pp. 142-153, 2021. https://doi.org/10.26555/jiteki.v7i1.20550

M. S. Islam and N. A. Ghani, "A Novel BiGRUBiLSTM Model for Multilevel Sentiment Analysis Using Deep Neural Network with BiGRU-BiLSTM," Singapore, Springer Singapore, vol. 730. pp. 403-414, 2021. https://doi.org/10.1007/978-981-33-4597-3_37

M. Zhen et al., "Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation," in European Conference on Computer Vision, Springer, vol. 12372, pp. 445-462, 2020. https://doi.org/10.1007/978-3-030-58583-9_27

T. Peng, Z. Zhang, K. Shen, and T. Jiang, "Video Classification Based On the Improved K-Means Clustering Algorithm," IOP Conf. Ser.: Earth Environ. Sci., vol. 440, p. 032060, 2020. https://doi.org/10.1088/1755-1315/440/3/032060

G. A. Ruz, P. A. Henríquez, and A. Mascareño, "Sentiment analysis of Twitter data during critical events through Bayesian networks classifiers," Future Generation Computer Systems, vol. 106, pp. 92-104, 2020. https://doi.org/10.1016/j.future.2020.01.005

Z. Li, R. Li, and G. Jin, "Sentiment Analysis of Danmaku Videos Based on Naïve Bayes and Sentiment Dictionary," IEEE Access, vol. 8, pp. 75073-75084, 2020. https://doi.org/10.1109/ACCESS.2020.2986582

X. Li and S. Geng, "Research on sports retrieval recognition of action based on feature extraction and SVM classification algorithm," Journal of Intelligent & Fuzzy Systems, vol. 39, no. 4, pp. 5797-5808, 2020. https://doi.org/10.3233/JIFS-189056

A. Yadav and D. K. Vishwakarma, "A unified framework of deep networks for genre classification using movie trailer," Applied Soft Computing, vol. 96, p. 106624, 2020. https://doi.org/10.1016/j.asoc.2020.106624

C. Li, A. Pourtaherian, L. Van Onzenoort, W. T. a Ten, and P. H. De With, "Infant Facial Expression Analysis: Towards A Real-time Video Monitoring System Using R-CNN and HMM," IEEE Journal of Biomedical and Health Informatics, vol. 25, pp. 1429-1440, 2020. https://doi.org/10.1109/JBHI.2020.3037031

J. Shen, Y. Huang, M. Wen, and C. Zhang, "Towards an efficient deep pipelined template-based architecture for accelerating the entire 2D and 3D CNNs on FPGA," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, pp. 1442-1455, 2019. https://doi.org/10.1109/TCAD.2019.2912894

H. Yang et al., "Asymmetric 3d convolutional neural networks for action recognition," Pattern recognition, vol. 85, pp. 1-12, 2019. https://doi.org/10.1016/j.patcog.2018.07.028

A. Kar, N. Rai, K. Sikka, and G. Sharma, "Adascan: Adaptive scan pooling in deep convolutional neural networks for human action recognition in videos," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3376-3385. https://doi.org/10.1109/CVPR.2017.604

S. Ji, W. Xu, M. Yang, and K. Yu, "3D convolutional neural networks for human action recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221-231, 2012. https://doi.org/10.1109/TPAMI.2012.59

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700-4708, 2017. https://doi.org/10.1109/CVPR.2017.243

W. Yang, Y. Chen, C. Huang, and M. Gao, "Video-based human action recognition using spatial pyramid pooling and 3D densely convolutional networks," Future Internet, vol. 10, no. 12, p. 115, 2018. https://doi.org/10.3390/fi10120115

C. Schuldt, I. Laptev, and B. Caputo, "Recognizing human actions: a local SVM approach," in Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004., vol. 3, IEEE, pp. 32-36, 2004. https://doi.org/10.1109/ICPR.2004.1334462

P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, "Behavior recognition via sparse spatio-temporal features," in 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, IEEE, pp. 65-72. https://doi.org/10.1109/VSPETS.2005.1570899




DOI: http://dx.doi.org/10.26555/jiteki.v7i2.21499

Refbacks

  • There are currently no refbacks.


Copyright (c) 2021 Md Shofiqul Islam, Sunjida Sultana, Md Jabbarul Islam

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


 
About the JournalJournal PoliciesAuthor Information
 


Jurnal Ilmiah Teknik Elektro Komputer dan Informatika
ISSN 2338-3070 (print) | 2338-3062 (online)
Organized by Electrical Engineering Department - Universitas Ahmad Dahlan
Published by Universitas Ahmad Dahlan
Website: http://journal.uad.ac.id/index.php/jiteki
Email 1: jiteki@ee.uad.ac.id
Email 2: alfianmaarif@ee.uad.ac.id
Office Address: Kantor Program Studi Teknik Elektro, Lantai 6 Sayap Barat, Kampus 4 UAD, Jl. Ringroad Selatan, Tamanan, Kec. Banguntapan, Bantul, Daerah Istimewa Yogyakarta 55191, Indonesia