New Hybrid Deep Learning Method to Recognize Human Action from Video

Authors

  • Md Shofiqul Islam Faculty of Computing, FSKKP,UMP,Gambag,Kuantan,Pahang,Malaysia.
  • Sunjida Sultana Faculty of CSE at Islamic University, Kushtia
  • Md Jabbarul Islam Faulty of Mathematics at National University, Gazipur, Bangladesh.

DOI:

https://doi.org/10.26555/jiteki.v7i2.21499

Keywords:

Video Classification, 3D, Deep learning, Video, Video action, Convolution.

Abstract

There has been a tremendous increase in internet users and enough bandwidth in recent years. Because Internet connectivity is so inexpensive, information sharing (text, audio, and video) has become more popular and faster. This video content must be examined in order to classify it for different purposes for users. Several machine learning approaches for video classification have been developed to save users time and energy. The use of deep neural networks to recognize human behavior has become a popular issue in recent years. Although significant progress has been made in the field of video recognition, there are still numerous challenges in the realm of video to be overcome. Convolutional neural networks (CNNs) are well-known for requiring a fixed-size image input, which limits the network topology and reduces identification accuracy. Despite the fact that this problem has been solved in the world of photos, it has yet to be solved in the area of video. We present a ten stacked three-dimensional (3D) convolutional network based on the spatial pyramid-based pooling to handle the input problem of fixed size video frames in video recognition. The network structure is made up of three sections, as the name suggests: a ten-layer stacked 3DCNN, DenseNet, and SPPNet. A KTH dataset was used to test our algorithms. The experimental findings showed that our model outperformed existing models in the area of video-based behavior identification by 2% margin accuracy.

Author Biographies

Md Shofiqul Islam, Faculty of Computing, FSKKP,UMP,Gambag,Kuantan,Pahang,Malaysia.

I am Md Shofiqul Islam, I have complete my B.Sc from Islamic University,Kushtia,Bangladesh. Now i ma a research assistant at University Malaysia Pahang(UMP), I am a teacher at ADUST university ,Dhaka. I am in teaching profession since 2015. My research field are: Deep learning, Machine learning, Natural Language Processing, Image Processing. I have published a lot fo papers in my field.

Sunjida Sultana, Faculty of CSE at Islamic University, Kushtia

Sunjida Sultana, she is completing master’s degree and completed bachelor’s degrees from the department of Computer Science and Engineering, Islamic University, Kushtia-7600, Bangladesh. She is working in the field of image processing, video processing and text processing. Her email is sunjidasultana51984@gmail.com.

Md Jabbarul Islam, Faulty of Mathematics at National University, Gazipur, Bangladesh.

Md Jabbarul Islam, he has completing bachelor’s degrees from the department of Mathematics, National University Gazipur-1704, Dhaka, Bangladesh. He is doing his research work in the field of Graph theory, Statistics, Machine learning, image processing, video processing and text processing. His email is abduljabbar11061997@gmail.com.

References

I. Khandokar, M. Hasan, F. Ernawan, S. Islam, and M. Kabir, "Handwritten character recognition using convolutional neural network," in Journal of Physics: Conference Series, IOP Publishing, vol. 1918, no. 4, p. 042152, 2021. https://doi.org/10.1088/1742-6596/1918/4/042152

M. S. Islam, S. Sultana, U. kumar Roy, and J. Al Mahmud, "A review on Video Classification with Methods, Findings, Performance, Challenges, Limitations and Future Work," Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), vol. 6, no. 2, pp. 47-57, 2020. https://doi.org/10.26555/jiteki.v6i2.18978

L. Fan, Z. Yin, H. Yu, and A. Gilliland, "Using Data-driven Analytics to Enhance Archival Processing of the COVID-19 Hate Speech Twitter Archive (CHSTA)," preprint, 2020. https://doi.org/10.31229/osf.io/gkydm

M. S. I. Shofiqul, N. Ab Ghani, and M. M. Ahmed, "A review on recent advances in Deep learning for Sentiment Analysis: Performances, Challenges and Limitations," COMPUSOFT: An International Journal of Advanced Computer Technology, vol. 9, no. 7, pp. 3768-3776, 2020. https://ijact.in/index.php/ijact/article/view/1175

M. S. Islam, S. Sultana, U. K. Roy, J. Al Mahmud, and S. Jahidul, "HARC-New Hybrid Method with Hierarchical Attention Based Bidirectional Recurrent Neural Network with Dilated Convolutional Neural Network to Recognize Multilabel Emotions from Text," Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), vol. 7, no. 1, pp. 142-153, 2021. https://doi.org/10.26555/jiteki.v7i1.20550

M. S. Islam and N. A. Ghani, "A Novel BiGRUBiLSTM Model for Multilevel Sentiment Analysis Using Deep Neural Network with BiGRU-BiLSTM," Singapore, Springer Singapore, vol. 730. pp. 403-414, 2021. https://doi.org/10.1007/978-981-33-4597-3_37

M. Zhen et al., "Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation," in European Conference on Computer Vision, Springer, vol. 12372, pp. 445-462, 2020. https://doi.org/10.1007/978-3-030-58583-9_27

T. Peng, Z. Zhang, K. Shen, and T. Jiang, "Video Classification Based On the Improved K-Means Clustering Algorithm," IOP Conf. Ser.: Earth Environ. Sci., vol. 440, p. 032060, 2020. https://doi.org/10.1088/1755-1315/440/3/032060

G. A. Ruz, P. A. Henríquez, and A. Mascareño, "Sentiment analysis of Twitter data during critical events through Bayesian networks classifiers," Future Generation Computer Systems, vol. 106, pp. 92-104, 2020. https://doi.org/10.1016/j.future.2020.01.005

Z. Li, R. Li, and G. Jin, "Sentiment Analysis of Danmaku Videos Based on Naïve Bayes and Sentiment Dictionary," IEEE Access, vol. 8, pp. 75073-75084, 2020. https://doi.org/10.1109/ACCESS.2020.2986582

X. Li and S. Geng, "Research on sports retrieval recognition of action based on feature extraction and SVM classification algorithm," Journal of Intelligent & Fuzzy Systems, vol. 39, no. 4, pp. 5797-5808, 2020. https://doi.org/10.3233/JIFS-189056

A. Yadav and D. K. Vishwakarma, "A unified framework of deep networks for genre classification using movie trailer," Applied Soft Computing, vol. 96, p. 106624, 2020. https://doi.org/10.1016/j.asoc.2020.106624

C. Li, A. Pourtaherian, L. Van Onzenoort, W. T. a Ten, and P. H. De With, "Infant Facial Expression Analysis: Towards A Real-time Video Monitoring System Using R-CNN and HMM," IEEE Journal of Biomedical and Health Informatics, vol. 25, pp. 1429-1440, 2020. https://doi.org/10.1109/JBHI.2020.3037031

J. Shen, Y. Huang, M. Wen, and C. Zhang, "Towards an efficient deep pipelined template-based architecture for accelerating the entire 2D and 3D CNNs on FPGA," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, pp. 1442-1455, 2019. https://doi.org/10.1109/TCAD.2019.2912894

H. Yang et al., "Asymmetric 3d convolutional neural networks for action recognition," Pattern recognition, vol. 85, pp. 1-12, 2019. https://doi.org/10.1016/j.patcog.2018.07.028

A. Kar, N. Rai, K. Sikka, and G. Sharma, "Adascan: Adaptive scan pooling in deep convolutional neural networks for human action recognition in videos," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3376-3385. https://doi.org/10.1109/CVPR.2017.604

S. Ji, W. Xu, M. Yang, and K. Yu, "3D convolutional neural networks for human action recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221-231, 2012. https://doi.org/10.1109/TPAMI.2012.59

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700-4708, 2017. https://doi.org/10.1109/CVPR.2017.243

W. Yang, Y. Chen, C. Huang, and M. Gao, "Video-based human action recognition using spatial pyramid pooling and 3D densely convolutional networks," Future Internet, vol. 10, no. 12, p. 115, 2018. https://doi.org/10.3390/fi10120115

C. Schuldt, I. Laptev, and B. Caputo, "Recognizing human actions: a local SVM approach," in Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004., vol. 3, IEEE, pp. 32-36, 2004. https://doi.org/10.1109/ICPR.2004.1334462

P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, "Behavior recognition via sparse spatio-temporal features," in 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, IEEE, pp. 65-72. https://doi.org/10.1109/VSPETS.2005.1570899

Downloads

Published

2021-09-01

How to Cite

[1]
M. S. Islam, S. Sultana, and M. J. Islam, “New Hybrid Deep Learning Method to Recognize Human Action from Video”, J. Ilm. Tek. Elektro Komput. Dan Inform, vol. 7, no. 2, pp. 306–313, Sep. 2021.

Issue

Section

Articles