New Hybrid Deep Learning Method to Recognize Human Action from Video
DOI:
https://doi.org/10.26555/jiteki.v7i2.21499Keywords:
Video Classification, 3D, Deep learning, Video, Video action, Convolution.Abstract
There has been a tremendous increase in internet users and enough bandwidth in recent years. Because Internet connectivity is so inexpensive, information sharing (text, audio, and video) has become more popular and faster. This video content must be examined in order to classify it for different purposes for users. Several machine learning approaches for video classification have been developed to save users time and energy. The use of deep neural networks to recognize human behavior has become a popular issue in recent years. Although significant progress has been made in the field of video recognition, there are still numerous challenges in the realm of video to be overcome. Convolutional neural networks (CNNs) are well-known for requiring a fixed-size image input, which limits the network topology and reduces identification accuracy. Despite the fact that this problem has been solved in the world of photos, it has yet to be solved in the area of video. We present a ten stacked three-dimensional (3D) convolutional network based on the spatial pyramid-based pooling to handle the input problem of fixed size video frames in video recognition. The network structure is made up of three sections, as the name suggests: a ten-layer stacked 3DCNN, DenseNet, and SPPNet. A KTH dataset was used to test our algorithms. The experimental findings showed that our model outperformed existing models in the area of video-based behavior identification by 2% margin accuracy.References
I. Khandokar, M. Hasan, F. Ernawan, S. Islam, and M. Kabir, "Handwritten character recognition using convolutional neural network," in Journal of Physics: Conference Series, IOP Publishing, vol. 1918, no. 4, p. 042152, 2021. https://doi.org/10.1088/1742-6596/1918/4/042152
M. S. Islam, S. Sultana, U. kumar Roy, and J. Al Mahmud, "A review on Video Classification with Methods, Findings, Performance, Challenges, Limitations and Future Work," Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), vol. 6, no. 2, pp. 47-57, 2020. https://doi.org/10.26555/jiteki.v6i2.18978
L. Fan, Z. Yin, H. Yu, and A. Gilliland, "Using Data-driven Analytics to Enhance Archival Processing of the COVID-19 Hate Speech Twitter Archive (CHSTA)," preprint, 2020. https://doi.org/10.31229/osf.io/gkydm
M. S. I. Shofiqul, N. Ab Ghani, and M. M. Ahmed, "A review on recent advances in Deep learning for Sentiment Analysis: Performances, Challenges and Limitations," COMPUSOFT: An International Journal of Advanced Computer Technology, vol. 9, no. 7, pp. 3768-3776, 2020. https://ijact.in/index.php/ijact/article/view/1175
M. S. Islam, S. Sultana, U. K. Roy, J. Al Mahmud, and S. Jahidul, "HARC-New Hybrid Method with Hierarchical Attention Based Bidirectional Recurrent Neural Network with Dilated Convolutional Neural Network to Recognize Multilabel Emotions from Text," Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), vol. 7, no. 1, pp. 142-153, 2021. https://doi.org/10.26555/jiteki.v7i1.20550
M. S. Islam and N. A. Ghani, "A Novel BiGRUBiLSTM Model for Multilevel Sentiment Analysis Using Deep Neural Network with BiGRU-BiLSTM," Singapore, Springer Singapore, vol. 730. pp. 403-414, 2021. https://doi.org/10.1007/978-981-33-4597-3_37
M. Zhen et al., "Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation," in European Conference on Computer Vision, Springer, vol. 12372, pp. 445-462, 2020. https://doi.org/10.1007/978-3-030-58583-9_27
T. Peng, Z. Zhang, K. Shen, and T. Jiang, "Video Classification Based On the Improved K-Means Clustering Algorithm," IOP Conf. Ser.: Earth Environ. Sci., vol. 440, p. 032060, 2020. https://doi.org/10.1088/1755-1315/440/3/032060
G. A. Ruz, P. A. HenrÃquez, and A. Mascareño, "Sentiment analysis of Twitter data during critical events through Bayesian networks classifiers," Future Generation Computer Systems, vol. 106, pp. 92-104, 2020. https://doi.org/10.1016/j.future.2020.01.005
Z. Li, R. Li, and G. Jin, "Sentiment Analysis of Danmaku Videos Based on Naïve Bayes and Sentiment Dictionary," IEEE Access, vol. 8, pp. 75073-75084, 2020. https://doi.org/10.1109/ACCESS.2020.2986582
X. Li and S. Geng, "Research on sports retrieval recognition of action based on feature extraction and SVM classification algorithm," Journal of Intelligent & Fuzzy Systems, vol. 39, no. 4, pp. 5797-5808, 2020. https://doi.org/10.3233/JIFS-189056
A. Yadav and D. K. Vishwakarma, "A unified framework of deep networks for genre classification using movie trailer," Applied Soft Computing, vol. 96, p. 106624, 2020. https://doi.org/10.1016/j.asoc.2020.106624
C. Li, A. Pourtaherian, L. Van Onzenoort, W. T. a Ten, and P. H. De With, "Infant Facial Expression Analysis: Towards A Real-time Video Monitoring System Using R-CNN and HMM," IEEE Journal of Biomedical and Health Informatics, vol. 25, pp. 1429-1440, 2020. https://doi.org/10.1109/JBHI.2020.3037031
J. Shen, Y. Huang, M. Wen, and C. Zhang, "Towards an efficient deep pipelined template-based architecture for accelerating the entire 2D and 3D CNNs on FPGA," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, pp. 1442-1455, 2019. https://doi.org/10.1109/TCAD.2019.2912894
H. Yang et al., "Asymmetric 3d convolutional neural networks for action recognition," Pattern recognition, vol. 85, pp. 1-12, 2019. https://doi.org/10.1016/j.patcog.2018.07.028
A. Kar, N. Rai, K. Sikka, and G. Sharma, "Adascan: Adaptive scan pooling in deep convolutional neural networks for human action recognition in videos," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3376-3385. https://doi.org/10.1109/CVPR.2017.604
S. Ji, W. Xu, M. Yang, and K. Yu, "3D convolutional neural networks for human action recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221-231, 2012. https://doi.org/10.1109/TPAMI.2012.59
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700-4708, 2017. https://doi.org/10.1109/CVPR.2017.243
W. Yang, Y. Chen, C. Huang, and M. Gao, "Video-based human action recognition using spatial pyramid pooling and 3D densely convolutional networks," Future Internet, vol. 10, no. 12, p. 115, 2018. https://doi.org/10.3390/fi10120115
C. Schuldt, I. Laptev, and B. Caputo, "Recognizing human actions: a local SVM approach," in Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004., vol. 3, IEEE, pp. 32-36, 2004. https://doi.org/10.1109/ICPR.2004.1334462
P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, "Behavior recognition via sparse spatio-temporal features," in 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, IEEE, pp. 65-72. https://doi.org/10.1109/VSPETS.2005.1570899
Downloads
Published
How to Cite
Issue
Section
License
Authors who publish with JITEKI agree to the following terms:
- Authors retain copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
This work is licensed under a Creative Commons Attribution 4.0 International License