Online video-based abnormal detection using highly motion techniques and statistical measures

At the essence of video surveillance, there are abnormal detection approaches, which have been proven to be substantially effective in detecting abnormal incidents without prior knowledge about these incidents. Based on the state-of-the-art research, it is evident that there is a trade-off between frame processing time and detection accuracy in abnormal detection approaches. Therefore, the primary challenge is to balance this trade-off suitably by utilizing few, but very descriptive features to fulfill online performance while maintaining a high accuracy rate. In this study, we propose a new framework, which achieves the balancing between detection accuracy and video processing time by employing two efficient motion techniques, specifically, foreground and optical flow energy. Moreover, we use different statistical analysis measures of motion features to get robust inference method to distinguish abnormal behavior incident from normal ones. The performance of this framework has been extensively evaluated in terms of the detection accuracy, the area under the curve (AUC) and frame processing time. Simulation results and comparisons with ten relevant online and non-online frameworks demonstrate that our framework efficiently achieves superior performance to those frameworks, in which it presents high values for he accuracy while attaining simultaneously low values for the processing time.


Introduction
Owing to its widespread applications in today's world, video-based surveillance research considers as one of the most current hot topics in computer vision field.Detecting abnormal events from surveillance videos is an important task because of the increasing need for public security [1][2][3].Despite that, the complication of the scenario and the unpredictability of the abnormalities make it a difficult and challenging task.A widespread solution is firstly by learning normal events from training data, and then identify abnormalities by measuring their resemblances or reconstruction errors in light of the learned normal models [4].Those normal models could depend on feature distribution [5], trajectory [6], graph model [7], or sparse representation [8].Currently, many methods have been proposed to detect abnormal incidents with very high accuracies, but only very few of these methods seek to detect abnormalities once they occur [9].To accomplish online performance, the current frame must be processed before the next new frame depending on the sequence's frame rate [10].Unfortunately, acquiring high processing speed always comes at the expense of detection accuracy or vice versa.Therefore, the big challenge in the abnormal detection field is to balance these two-performance metrics appropriately.Recently, there is a lot of research, which have been proposed to detect abnormalities [11][12][13].
Authors in [14] propose an approach to detect video-based anomalies using a determined entropy measure, which is calculated based on statistical processing of the spatiotemporal information for a group of interest points inside an area of interest by measuring and analyzing their randomness of both displacements and directions.Building a model based on raw video sequences is hard since the model intricacy would be considerably high.One solution extensively used to address this matter is to partition the video into small  ISSN: 1693-6930 TELKOMNIKA Vol.17, No. 4, August 2019: 2039-2047 2040 spatial-temporal patches [10][11][12][14][15][16], after that, some subsequent operations, as well as abnormal detection algorithm, are applied on these patches.However, this approach experiences several problems.One of these problems is that the patches only consist of information about the background, which does not assist in individual behavior modeling.Another problem, as mentioned in [15], is that all abnormal detection methods that employ this approach only partition the video into uniform patches, without taking into account the moving objects as a whole.
Authors in [17] proposed a method based on densely constructed spatiotemporal video volumes that are organized into large contextual graphs.For the dominant behaviors, a hierarchical codebook model is constructed.This method is able to model low-level and high-level spatial behaviors at the same time as well as the temporal and spatiotemporal pixel level changes.Subsequently, in [18], these spatiotemporal compositions are enhanced in terms of frame processing times, by taking into account the temporal variations of frame volumes as a descriptor.Authors in [19] proposed a fast-sparse method to detect anomalies by learned sparse combinations to quicken the coding phase.Although this approach improves the processing time of detection up to, on average,140-150 frames per second, the accuracy is significantly influenced by the threshold, which frequently differs for different scenes.Cong et al. [20] proposed a method to use a sparse coding model and multi-scale histograms of optical flow as well as the reconstruction error as a metric for anomaly detection.Authors in [21] proposed a method to detect abnormalities by employing cues from the movement vectors in H.264/AVC compressed videos.The method suggests hierarchical processing in which the detection begins at coarsest level up to the final one.To classify the abnormal behavior from the normal one, the Gaussian Mixture Model (GMM) is utilized.Further improvement of this method is done by [22].The authors adding orientation information for the movement vectors.In addition, they used non-parametric modeling in opposition to the previous parametric one, which assisted in enhancing the detection accuracy.The main differences between our framework and the other existing frameworks are as follows: a) Drawing out highly descriptive features, our framework utilizes two efficient motion algorithms while maintaining online-performance.In particular, it utilizes optical flow and background subtraction features, both of which effectively and independently utilized in the past [10,23].b) The proposed framework is different from [24] in three aspects: the first one is that the proposed framework utilizes from extracting foreground features besides the ones from optical flow.This is very useful in the cases when the features extracted from optical flow energy are not descriptive sufficiently.Thus, the foreground features assist the proposed framework to detect abnormal incident properly.The second one is that our framework uses an adaptive threshold based on the training dataset.We also propose to use standard deviation measure to measure the deviation of data from the normal rate.The remainder of this paper is organized as follows.The proposed framework in detail is described in Section 2. Section 3 sheds light on the public datasets, experimental setup and performance metrics, which use to evaluate the performance of the proposed framework.Section 4 presents the results and comparison experiments to demonstrate the advantages of the proposed framework.Finally, Section 5 concludes this study and suggests future work.

Research Method
This study proposes a new framework for online detection of abnormal behavior by relying on the foreground and optical flow energy features.The block diagram of the proposed framework is shown in Figure 1.

Pre-Processing
For simplicity, we resize all the frames into 240×320.After that, we convert all the frames to grayscale images.Changes in lighting conditions have an important effect on the performance of abnormal detection algorithms [1,25].One way to handle disparate illumination is to utilize illumination normalization as a pre-processing phase.In the proposed framework, we apply the illumination normalization algorithm using histogram equalization technique to control lighting conditions.After that, we apply Gaussian filtering to remove unwanted small objects.

Feature Extraction
There are various algorithms for motion detection in live stream videos [1,10].Almost, all of them depend on comparing the current frame of video with the previous one.In the proposed framework, we use two efficient motion detection algorithms to detect and extract motion features, namely, background subtraction and optical flow.Further details on these algorithms are in the following paragraphs: Foreground features: Foreground features are very beneficial to determine long-term incidents [10].Foreground features points out to the information that represents the size of objects and their corresponding time in a certain scene [26].They are obtained by making use of background subtraction approach on video frame [27].Implementing background subtraction produces one binary mask for each video frame, in which the true logical values constitute the foreground features.
Optical flow energy features: Optical flow is employed to represent the global visible motion of the objects between consecutive frames [28].For estimating optical flow there are three popular approaches: Lucas-Kanade [29], Pyramid Lucas-Kanade [30] and Horn-Schunck [31], the last one is used in our framework because it is the best approach that gave us practically the best results to compute efficiently dense flow fields.The optical flow energy features are computed as in the following: where   and   refer to the horizontal and vertical components of optical flow in the space location (  ,   ), respectively at the frame difference   .N indicates the number of pixels for each video frame.

Frame Difference Map (FDM)
Frame Difference Map (FDM) is created based on the differences in the generative masks of the consecutive frames.Firstly, we obtain the generative masks for the two consecutive frames, the previous generative mask (PGM) and current generative mask (CGM).Then, we compute the difference between them as in the following: We also apply the morphological filter to remove small objects or noisy objects from the FDM. Figure 2 describes how to compute the FDM.As shown in Figure 1, the computed FDM will be employed to calculate the EOS statistical measure as in the next subsection.
where k refers to the number of independent symbols and f z is the frequency of the z-th pixel in the image [24,32].b) Occupancy Measure (OM): This measure refers to the area occupied by the detected objects over time.The more OM value gets up, the more changing in the scene.We presume that in case of a significant difference in the OM value, a suspicious incident is occurring.Occupancy measure can be computed as in the following: where  ×  = 240 × 320 is the size of the frame.c) Standard Deviation Measure (SM): For a random matrix FDM made up of N observations, the standard deviation is defined as: where  is the mean of FDM, which can be calculated as follows:

Adaptive Thresholding and Inference Phase
To obtain an adaptive threshold to be used in our proposed framework, we use quantiles algorithm (QA) [33] on the training datasets to find all measures explained in detail in the previous section for each frame difference map (FDM).QA will return quantiles of the values in a vector of data for the cumulative probability p in the range [0, 1].The following steps describe how to find the threshold using quantiles for the three measures: a. First, we will find the Entropy, Occupancy and Standard deviation measures for each FDM in the training datasets.Consequently, this will provide us three vectors for Entropy (EV), Occupancy (OV), and Standard deviation (SV) measures, respectively.Here, in our framework, we choose  = 0.80 ∈ [(0.5/) and ([ − 0.5]/).-Consider the values resulted from quantile function as QE, QO, and QS for EV, OV, and SV, respectively.Hence, these are the three adaptive thresholds for the three measures.c.For inferring an abnormal behavior in the testing datasets, we first calculate EV, OV, and SV measures for each testing FDM.The abnormal incidents can be found as in the following:

Current Generative Mask
where n is the number of frames.

Experimental Setup and Performance Metrics
The experiments on this study have been implemented and simulated using MATLAB R2017b (9.3.0.713579) x64 and OpenCV libraries on Linux platform with an Intel Core i7-4600U CPU working at 2.10 GHz with a 4 MB cache and 8 GB RAM.There are no GPU arrays in the implementation code to accelerate the computations, also the code is not parallelized.When the proposed framework encounters an abnormal incident like the one shown in Figure 4 (a).Automatically the framework will display an alert with "ALARM".Extracted foreground features are illustrated in Figure 4 (b).Also, the FDM between two consecutive generative masks is shown in Figure 4 (c).
The performance of the proposed framework was evaluated in terms of accuracy, receiver operating characteristic (ROC) and area under the curve (AUC).To measure the accuracy, we use the confusion matrix [18].ROC curves and AUC are very useful for scholars in the fields of science, machine learning, computer vision and many others.The ROC curves are the most frequent manner that is used to demonstrate the performance of a binary classifier, while AUC is the best manner to sum the performance of the classifier in one single value.The ROC curve draws both the true positive rate (TPR) and the false positive rate (FPR) by applying several threshold values.The TPR is also known in machine learning as the probability of detection, recall, or sensitivity.Similarly, the FPR is also known as the probability of false alarm or fall-out.The area under the curve (AUC) is computed using trapz (FPR, TPR) in MATLAB.AUC value ranges between [0-1] and it describes how well the algorithm can correctly classify the behavior [36,37].

Results And Analysis
To evaluate and compare the proposed framework with other frameworks, we used the codes available in [38,39], for Lu et al. [19] and Biswas and Babu [21], respectively.The results of the other methods are taken from state-of-the-art papers in [10,19].The results of the proposed framework are shown in Table 1 employing UMN and UCSD ped1 datasets.Because of the nature of the two-class confusion matrix, there are four possible combinations: True Negative (TN), False Negative (FN), False Positive (FP), and True positive (TP), as shown in Table 1.
Table 2 demonstrates the average AUC applying our proposed framework besides different anomaly detection approaches using UMN dataset.Since the non-online frameworks seek mostly to attain detection accuracy in the account of the processing time, we compare our framework with non-online frameworks to prove that our framework offers very competitive results in terms of both mentioned criteria.As noted in Table 2, our framework presents the best results in terms of the AUC values comparing to all online methods.Moreover, comparing to non-online methods, the proposed framework achieves very competitive results to Li et al. [16] and Zhu et al. [40], while the frame processing time in the proposed framework is much less.In addition, we record the frame-level ROC curve applying this dataset.The outcomes are illustrated in Figure 5 (a).It can be noticed that the proposed framework is superior to the online approaches proposed by Pennisi et al. [24], Leyva et al. [10], and Biswas and Babu [21], respectively.Also, our framework achieves great competitive results in comparing to other non-online approaches that have been designed in particular for attaining high performance in terms of accuracy, and not processing times, such as the one proposed by Li et al. [16].With our framework, the overall average accuracy for UMN dataset reaches up to 97.393%.Similarly, in Figure 5 (b), comparing our framework with Biswas and Babu [21], Mahadevan et al. [35], and Adam et al. [41] in UCSD-PED1, our framework achieves the best performance.

Conclusion
The main goal of intelligent video-based surveillance systems is to distinguish efficiently any suspected incident from a large number of videos to prevent risky cases.Mostly, to achieve this, two significant tasks should be employed.Firstly, feature extraction, which aims to detect  and extract an interesting area in a scene.Then, primitives based on these visual features are created to describe the interest area.Secondly, apply an inference method based on the provided semantic information about the human motion and determine whether the behavior is normal or abnormal.
In spite of the outstanding development in the area of anomaly behavior detection, there are some obstacles that make it complicated and challenging.For instance, the choice of features that are utilized to describe the moving object is a hard task as it affects drastically the characterizing and the analysis of the behavior.Furthermore, there are only a few online frameworks for video abnormal detection.To overcome these limitations, in this study, we proposed an efficient-online framework for abnormal behavior detection in surveillance videos.We tested our framework on the common datasets, UMN and UCSD.During conducting the simulation experiments, we noted that there is a trade-off between both the accuracy and frame processing time.Therefore, we design the proposed framework to be able to achieve high detection accuracy while attaining online performance by employing highly descriptive features, specifically, foreground and optical flow energy features as well as utilizing different statistical measures to efficiently analysis them.The proposed framework attains comparable performance to both online and non-online state-of-the-art abnormal detection approaches.Our future work could be to extend the proposed framework to other video applications.

Figure 2 .
Figure 2. Frame difference map (FDM) Figure 3 presents an instance wherein the EOS measure increases in the case of occurring an anomaly incident.a) Entropy Measure (EM): It is a statistical measure of randomness that may be utilized to obtain the suspicion values of an image.It counts the quantity of information, on average, desired to encode the values of an image.The scene behavior entropy for an image  can be calculated using the following equation:

Figure 3 .
Figure 3. EOS variation in the indoor scene from the UMN dataset

Figure 4 .
Figure 4.An example shows sample abnormal incident from UMN dataset that is detected by the proposed framework (a) abnormal event, (b) foreground features, (c) frame difference map (a) Abnormal event (b) Foreground features (c) Frame difference map (a) Abnormal event (b) Foreground features (c) Frame difference map (a) Abnormal event (b) Foreground features (c) Frame difference map TELKOMNIKA ISSN: 1693-6930  Online video-based abnormal detection using highly motion techniques... (Ahlam Al-Dhamari) 2045

Table 1 .
The Proposed Framework Results on UMN and UCSD-Ped1 Datasets (TN: Normal Patterns that are Correctly Detected, FN: Normal Patterns that are Wrongly Detected, FP: Abnormal Patterns that are Wrongly Detected, TP: Abnormal Patterns that are Correctly Detected)

Table 2 .
AUC Values Applying Different Anomaly Detection Approaches on the UMN Dataset6