A review on Video Classification with Methods, Findings, Performance, Challenges, Limitations and Future Work


  • Md Shofiqul Islam Faculty of Soft computing, FSKKP, UMP, Gambag, Kuantan, Pahang, Malaysia.
  • Sunjida Sultana Faculty of Computer Science and Engineering, Islamic University, Kushtia-7600, Bangladesh.
  • Uttam Kumar Roy Assistant Programmer at Bangladesh Bank, Bangladesh.
  • Jubayer Al Mahmud Senior Software Engineer at Charja Solutions Limited, Dhaka,Bangladesh.




Video Classification, Machine learning, Deep learning, Video, Video classification


In recent years, there has been a rapid development in web users and sufficient bandwidth. Internet connectivity, which is so low cost, makes the sharing of information (text, audio, and videos) more common and faster. This video content needs to be analyzed for prediction it classes in different purpose for the users. Many machines learning approach has been developed for the classification of video to save people time and energy. There are a lot of existing review papers on video classification, but they have some limitations such as limitation of the analysis, badly structured, not mention research gaps or findings, not clearly describe advantages, disadvantages, and future work. But our review paper almost overcomes these limitations. This study attempts to review existing video-classification procedures and to examine the existing methods of video-classification comparatively and critically and to recommend the most effective and productive process. First of all, our analysis examines the classification of videos with taxonomical details, the latest application, process, and datasets information. Secondly, overall inconvenience, difficulties, shortcomings and potential work, data, performance measurements with the related recent relation in science, deep learning, and the model of machine learning. Study on video classification systems using their tools, benefits, drawbacks, as well as other features to compare the techniques they have used also constitutes a key task of this review. Lastly, we also present a quick summary table based on selected features. In terms of precision and independence extraction functions, the RNN (Recurrent Neural Network), CNN (Convolutional Neural Network) and combination approach performs better than the CNN dependent method.

Author Biographies

Md Shofiqul Islam, Faculty of Soft computing, FSKKP, UMP, Gambag, Kuantan, Pahang, Malaysia.

Md Shofiqul Islam, Currently, he is doing Masters (Research-based), a student at University Malaysia Pahang (UMP), Pahang, Malaysia, He have completed my B. Sc. in 2014 in CSE from Islamic University, Kushtia, Bangladesh. Now he is a research assistant at University Malaysia Pahang (UMP), He is also a teacher at CSE under the faculty of FST at ADUST university, Dhaka. He is also in the teaching profession since 2015. His research field is Deep learning, Machine learning, Natural Language Processing, Image Processing. He has published a lot of papers in his field. 

Sunjida Sultana, Faculty of Computer Science and Engineering, Islamic University, Kushtia-7600, Bangladesh.

Shanjida Sultana, she is completing a master’s degree and completed a bachelor’s degree from the Department of Computer Science and Engineering, Islamic University, Kushtia-7600, Bangladesh. She is working in the field of image processing, video processing, and text processing. 

Uttam Kumar Roy, Assistant Programmer at Bangladesh Bank, Bangladesh.

Uttam Kumar Roy, he has completed bachelor's and master’s degrees from the Department of Computer Science and Engineering, Islamic University, Kushtia-7600, Bangladesh. Now he is working as Assistant Programmer at Bangladesh Bank-The Central Bank of Bangladesh. Head Office, Motijheel Commercial Area, PO Box 325, Dhaka 1000. He is also doing his research work in the field of Machine learning, image processing, video processing, and text processing. 

Jubayer Al Mahmud, Senior Software Engineer at Charja Solutions Limited, Dhaka,Bangladesh.

Jubayer Al Mahmud, he has completed master's and bachelor’s degrees from the Department of Computer Science and Engineering, Islamic University, Kushtia-7600, Bangladesh. Now he is working as Senior Software Engineer at Charja Solutions Limited,129-Kha/1, Elephant Road, New Market, Dhaka-1205. He is also doing his research work in the field of Machine learning, IoT, image processing, video processing, and text processing.


Brezeale, D. and D.J. Cook, Automatic video classification: A survey of the literature. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 38, no. 3, p. 416-430, 2008. DOI: https://doi.org/10.1109/TSMCC.2008.919173

Wu, Z., et al., Deep learning for video classification and captioning, in Frontiers of multimedia research, 3122867 p. 3-29, 2017. DOI: https://doi.org/10.1145/3122865.3122867

Ren, Q., et al., A Survey on Video Classification Methods Based on Deep Learning. DEStech Transactions on Computer Science and Engineering, cisnrc, 33301 .p. 1-7, 2019. DOI: https://doi.org/10.12783/dtcse/cisnrc2019/33301

Anushya, A., VIDEO TAGGING USING DEEP LEARNING: A SURVEY, International Journal of Computer Science and Mobile Computing,Vol.9 Issue.2,pg. 49-55,2020.

Rani, P., J. Kaur, and S. Kaswan, Automatic Video Classification: A Review. EAI Endorsed Transactions on Creative Technologies, ,7(24), p. 163996,2020). DOI: https://doi.org/10.4108/eai.13-7-2018.163996

Li, Y., C. Wang, and J. Liu, A Systematic Review of Literature on User Behavior in Video Game Live Streaming. International Journal of Environmental Research and Public Health, vol. 17, no. 9, p. 3328,2020. DOI: https://doi.org/10.3390/ijerph17093328

Zhen, M., et al. Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation. in European Conference on Computer Vision. Springer, LNCS, volume 12372,pp 445-46,2020. DOI: https://doi.org/10.1007/978-3-030-58583-9_27

Li, Z., R. Li, and G. Jin, Sentiment Analysis of Danmaku Videos Based on Naïve Bayes and Sentiment Dictionary. IEEE Access, vol. 8, p. 75073-75084,2020. DOI: https://doi.org/10.1109/ACCESS.2020.2986582

Ruz, G.A., P.A. Henríquez, and A. Mascareño, Sentiment analysis of Twitter data during critical events through Bayesian networks classifiers. Future Generation Computer Systems, 106: p. 92-104,2020. DOI: https://doi.org/10.1016/j.future.2020.01.005

Xu, Q., et al., Aspect-based sentiment classification with multi-attention network. Neurocomputing, vol. 388, p. 135-143, 2020. DOI: https://doi.org/10.1016/j.future.2020.01.005

Bibi, M., et al., A Cooperative Binary-Clustering Framework Based on Majority Voting for Twitter Sentiment Analysis. IEEE Access, Vol. 8, p. 68580 - 68592,2020. DOI: https://doi.org/10.1109/ACCESS.2020.2983859

Sailunaz, K. and R. Alhajj, Emotion and sentiment analysis from Twitter text. Journal of Computational Science, vol. 36, p. 101003, 2020. DOI: https://doi.org/10.1016/j.jocs.2019.05.009

Peng, T., et al., Video Classification Based On the Improved K-Means Clustering Algorithm. E&ES, vol. 440, no. 3, p. 032060,2020. DOI: https://doi.org/10.1088/1755-1315/440/3/032060

Li, X. and S. Geng, Research on sports retrieval recognition of action based on feature extraction and SVM classification algorithm. Journal of Intelligent & Fuzzy Systems, vol. 39, no. 4, pp. 5797-5808, 2020. DOI: https://doi.org/10.3233/JIFS-189056

Alomari, E., R. Mehmood, and I. Katib, Sentiment Analysis of Arabic Tweets for Road Traffic Congestion and Event Detection, in Smart Infrastructure and Applications, Springer. p. 37-54, 2020. DOI: https://doi.org/10.1007/978-3-030-13705-2_2

Ren, R., D.D. Wu, and T. Liu, Forecasting stock market movement direction using sentiment analysis and support vector machine. IEEE Systems Journal, vol. 13, no. 1, p. 760-770, 2020.DOI: https://doi.org/10.1109/JSYST.2018.2794462

Yadav, A. and D.K. Vishwakarma, A unified framework of deep networks for genre classification using movie trailer. Applied Soft Computing, vol. 96: p. 106624, 2020. DOI: https://doi.org/10.1016/j.asoc.2020.106624

Parameswaran, S., et al., Exploring Various Aspects of Gabor Filter in Classifying Facial Expression, in Advances in Communication Systems and Networks, Springer. p. 487-500, 2020. DOI: https://doi.org/10.1007/978-981-15-3992-3_41

Hauptmann, A., et al., with the Informedia Digital Video Library System, MULTIMEDIA '94,Pages 480–481, 1994.

Warner, W. and J. Hirschberg. Detecting hate speech on the world wide web. in Proceedings of the second workshop on language in social media. 2012. Association for Computational Linguistics. (LSM 2012), pages 19–26, 2012.

Li, C., et al., Infant Facial Expression Analysis: Towards A Real-time Video Monitoring System Using R-CNN and HMM. IEEE Journal of Biomedical and Health Informatics, 9254091, pp 1-12, 2020. DOI: https://doi.org/10.1109/JBHI.2020.3037031

Shen, J., et al., Towards an efficient deep pipelined template-based architecture for accelerating the entire 2D and 3D CNNs on FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019. 1442 - 1455,Vol. 39, no. 7, July 2020. DOI: https://doi.org/10.1109/TCAD.2019.2912894

Meng, B., X. Liu, and X. Wang, Human action recognition based on quaternion spatial-temporal convolutional neural network and LSTM in RGB videos. Multimedia Tools and Applications, vol. 77, no. 20, p. 26901-26918,2018. DOI: https://doi.org/10.1007/s11042-018-5893-9

Yang, H., et al., Asymmetric 3d convolutional neural networks for action recognition. Pattern recognition, vol. 85, p. 1-12, 2019. DOI: https://doi.org/10.1016/j.patcog.2018.07.028

Kar, A., et al. Adascan: Adaptive scan pooling in deep convolutional neural networks for human action recognition in videos. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. (CVPR), pp. 3376-3385,2017. DOI: https://doi.org/10.1109/CVPR.2017.604

Cho, K., et al., Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, p. 1-45, 2014. DOI: https://doi.org/10.3115/v1/D14-1179

Shofiqul, M.S.I., N. Ab Ghani, and M.M. Ahmed, A review on recent advances in Deep learning for Sentiment Analysis: Performances, Challenges and Limitations. COMPUSOFT: An International Journal of Advanced Computer Technology, vol. 9, no. 7, p. 3768-3776, 2020.

Kalra, G.S., R.S. Kathuria, and A. Kumar. YouTube Video Classification based on Title and Description Text. in 2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS). 2019. IEEE. ICCCIS48478,p. 8974514,2019. DOI: https://doi.org/10.1109/ICCCIS48478.2019.8974514

Yuan, F., et al., End-to-end video classification with knowledge graphs. arXiv preprint arXiv:1711.01714, 2017. 1711.01714, pp 1-9, 2017.

Voulodimos, A., et al., Deep learning for computer vision: A brief review. Computational intelligence and neuroscience, 7068349, pp 1-13, 2019. DOI: https://doi.org/10.1155/2018/7068349

Sargano, A.B., P. Angelov, and Z. Habib, A comprehensive review on handcrafted and learning-based action representation approaches for human activity recognition. applied sciences, vol. 7, no. 1, p. 110,2017. DOI: https://doi.org/10.3390/app7010110

Elboushaki, A., et al., MultiD-CNN: A multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences. Expert Systems with Applications, vol. 139: p. 112829, 2020. DOI: https://doi.org/10.1016/j.eswa.2019.112829

Huiqun, Z., W. Hui, and W. Xiaoling. Application research of video annotation in sports video analysis. in 2011 International Conference on Future Computer Science and Education.IEEE, 6041660, p. 1-5, 2011. DOI: https://doi.org/10.1109/ICFCSE.2011.24

Herath, S., M. Harandi, and F. Porikli, Going deeper into action recognition: A survey. Image and vision computing, vol. 60, p. 4-21, 2017. DOI: https://doi.org/10.1016/j.imavis.2017.01.010

Chen, H., et al., Action recognition with temporal scale-invariant deep learning framework. China Communications, vol. 14, no. 2, p. 163-172, 2017. DOI: https://doi.org/10.1109/CC.2017.7868164

Peng, X., et al. Action recognition with stacked fisher vectors. in European Conference on Computer Vision, Springer. ECCV,2014,pp 581-595, 2014. DOI: https://doi.org/10.1007/978-3-319-10602-1_38

Lan, Z., et al. Beyond gaussian pyramid: Multi-skip feature stacking for action recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition, (CVPR), pp. 204-212, 2015.

Dalal, N., B. Triggs, and C. Schmid. Human detection using oriented histograms of flow and appearance. in European conference on computer vision, Springer. ECCV, p. 428-441, 2006. DOI: https://doi.org/10.1007/11744047_33

Asadi-Aghbolaghi, M., et al. A survey on deep learning based approaches for action and gesture recognition in image sequences. in 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017), IEEE. 7961779, p. 1-8, 2017. DOI: https://doi.org/10.1109/FG.2017.150

Yang, X., P. Molchanov, and J. Kautz. Multilayer and multimodal fusion of deep neural networks for video classification. in Proceedings of the 24th ACM international conference on Multimedia, 2964297, p. 978–987. 2016. DOI: https://doi.org/10.1145/2964284.2964297

Yue-Hei Ng, J., et al. Beyond short snippets: Deep networks for video classification. in Proceedings of the IEEE conference on computer vision and pattern recognition,(CVPR), p. 4694-4702, 2015.

Dvir, A., et al., Encrypted Video Traffic Clustering Demystified. Computers & Security, Volume 96, p. 101917, 2020. DOI: https://doi.org/10.1016/j.cose.2020.101917

Yin, D., et al., Detection of harassment on web 2.0. Proceedings of the Content Analysis in the WEB, 2: p. 1-7, 2009.




How to Cite

M. S. Islam, S. Sultana, U. K. Roy, and J. A. Mahmud, “A review on Video Classification with Methods, Findings, Performance, Challenges, Limitations and Future Work”, J. Ilm. Tek. Elektro Komput. Dan Inform, vol. 6, no. 2, pp. 47–57, Jan. 2021.




Similar Articles

1 2 3 4 > >> 

You may also start an advanced similarity search for this article.