Demosaicing of Color Images by Accurate Estimation of Luminance

Digital cameras acquire color images using a single sensor with Color filter Arrays. A single color component per pixel is acquired using color filter arrays and the remaining two components are obtained using demosaicing techniques. The conventional demosaicing techniques existent induce artifacts in resultant images effecting reconstruction quality. To overcome this drawback a frequency based demosaicing technique is proposed. The luminance and chrominance components extracted from the frequency domain of the image are interpolated to produce intermediate demosaiced images. A novel Neural Network Based Image Reconstruction Algorithm is applied to the intermediate demosaiced image to obtain resultant demosaiced images. The results presented in the paper prove the proposed demosaicing technique exhibits the best performance and is applicable to a wide variety of images


Introduction
A color image is usually composed of three color planes and, accordingly, three separate sensors are required for a camera to measure an image. To reduce the cost, many cameras use a single sensor covered with a color filter array (CFA). The most common CFA used is the Bayer CFA [1,2], which is shown in Figure 1. In the CFA-based sensor configuration, only one color is measured at each pixel and the missing two color values are estimated by interpolation. The estimation process is known as color demosaicing. Demosaicing problem is a special case of the image reconstruction. The most straight forward approach for solving the demosaicing problem in color image is to apply one of the standard reconstruction methods for gray scale images on each channel separately. Many methods have been proposed for single channel reconstruction such as Interpolation, Regularization and Inverse Filtering. Reconstructing each channel separately produces artifacts. Better reconstruction of the image can be obtained by taking into consideration the cross channel correlation. Several methods have been suggested to solve cross channel correlation demosaicing problem. Existing approaches for inter channel correlations improve performance over independent channel reconstruction. But still there is a possibility for improvement. Template matching, filtering and luminance based approaches are basis for various techniques, but they are limited in the capability of reconstruction with different spatial and chromatic characteristics and results in some artifacts in the reconstructed images.
Demosaicing techniques can be broadly classified as frequency domain [4][5][6][7] based and spatial domain based [8][9][10]. A detailed survey of the varied demosaicing techniques is presented in [3] and [11]. Based on the literature reviewed it is evident that the existing state of art demosaicing algorithms induce artifacts in the reconstructed image [3,9], [11][12]. A detailed report on the types of artifacts observed in demosaiced images is presented in [13]. To minimize occurrences of artifacts in demosaiced images, the use knowledge derived from of local region or local patches [14][15][16][17] to interpolate/estimate the missing color components is proposed. An iterative demosaicing technique is essential to derive local knowledge and achieve accurate estimation [18,19]. In recent times the adoption of neural networks in image processing [20,21] to derive knowledge is a motivating factor for the authors of this paper. A neural network based demosaicing technique proposed in [22] bears the closest similarity to the work proposed here. In [22] the rotational invariance is used to train the neural networks and estimate the missing color pixel.
In this paper a frequency domain based demosaicing technique is proposed. Based on the frequency components, luminance and chrominance information is used to represent the image. The input image's luminance and chrominance components are segregated. Using bilinear interpolation the luminance and three chrominance channels are combined to produce an intermediate image that exhibits errors or artifacts. To eliminate the errors in the intermediate image, a neural network based reconstruction is proposed. Reconstruction algorithm is referred to as Neural Network Based Image Reconstruction Algorithm (NNIRA) in the proposed system. The neural networks derive local knowledge from the image patches constructed. The knowledge assists in estimating the missing components.
The remaining paper is organized as follows. The proposed system is presented in section two. The experimental study and performance comparison of our proposed with other state of art demosaicing algorithms is presents in section three. The conclusions and future work is discussed in the last section of the paper.

Frequency Domain NNIRA Approach (Proposed System)
NNIRA based techniques have been employed to reconstruct the image but to the best of our knowledge no author has attempted to consider Luminance component (extracted in the Frequency domain) to reconstruct the original image (using neural networks).
The steps of the proposed image demosaicing technique considering luminance information in the frequency domain are as follows: 1) Image Alignment: Align the set of images pairwise using the low frequency (luminance) information of the CFA Fourier transform images. 2) Luminance/Chrominance Separation: Extract the luminance and chrominance information from each of the input images. Bilinear interpolation is adopted to combine the luminance and the three chrominance channels resulting in an intermediate image that is not artefact free. 3) Image Reconstruction: Neural Network Based Image Reconstruction Algorithm (NNIRA) is used for image reconstruction and removal of artifacts observed in the intermediate image.
During training phase, the model is furnished with both the intermediate image patches P for i 1,2,3, … . . N and the original patches Q .After training, NNIRA will be able to reconstruct the corresponding demosaiced image for any given error observation.

Research Method
Based on the idea presented by Alleysson et al [23].luminance and chrominance information are encoded separately from the Fourier spectrum of a Bayer CFA image. They showed that a Bayer CFA image x, y can be written as a sum of the red, green, and blue color channels. The image x, y is the i th color channel image, and T x, y is a modulation matrix, which is '1' at the measured positions of the image, and '0' elsewhere. As these modulation functions are a combination of cosines, their Fourier transform is a combination of Diracs. Using the fact that a product in spatial domain corresponds to a convolution in frequency domain, we obtain: Where Fourier transforms are indicated in bold. Any color image u, v can be represented as a sum of a scalar representing its luminance α u, v and a length three vector u, v that is called chrominance and represents opponent colors as below in Equation (3) u, v Luminance is defined as α 2 /2, we obtain: Using this definition, we can see that the first term in Equation (2) corresponds to the luminance signal α u, v and the two other terms represent the chrominance u, v .Because of the modulation functions, the luminance part appears in low frequency region of the spectrum, and a the chrominance part appears in high frequency region.
Using a low pass filter, the luminance information from the images is extracted In this paper, a frequency domain approach is used, which uses only the low frequencies for image alignment, which are less prone to aliasing. As this is also the part of the CFA Fourier transform that contains the luminance information, we apply our algorithm directly on the raw CFA images. Next, we separate the images into luminance and chrominance using a low pass filter, and interpolate the two separately.
The frequency domain approach presented by Vandewalle et al [24] is used to align Bayer CFA images and this algorithm selects only the low frequency information since this part of the spectrum is less corrupted by aliasing and to get luminance information a part of Spectrum encoding is done as shown in Equation (2), therefore we can directly apply alignment algorithm to the raw CFA images. First we perform planar rotation estimation, followed by planar shift estimation. The rotation angle is estimated by computing the frequency content δ of the image as a function of the angle for each of the input images. Where ρ, θ is the Fourier transform of the CFA image , converted in polar coordinates, the rotation angle between two images can then be found at the maximum of the correlation between two such functions. Next, the rotation is canceled, and the shifts are estimated by computing the least squares fit of a plane through the (linear) phase difference between the images. As we only use the low frequency information of the images, we do not need to separate luminance and chrominance for this phase. The use of the raw sensor data for the image alignment allows a higher precision of the alignments, as no additional filtering or interpolation errors are introduced.

Luminance / Chrominance Separation
Here we separate the luminance and chrominance information in each of the images in order to interpolate them separately. As indicated in Equation (2), we extract the luminance signal from the CFA images using a low-pass filter specified by Alleysson et al [23]. The three chrominance parts (for red, green and blue) are then obtained by subtracting this luminance information from the red, green and blue channels of the CFA image and demodulating the result. This results in a luminance image α and three chrominance images The matrices and are two demodulation (or interpolation) filters, and the symbol ʘ is used for a point wise multiplication of two matrices.From separated and demodulated luminance and chrominance signals we compute their high resolution versions using Normalized Convolution (NC) approach proposed in [27] on each of the four channel separately. A Gaussian weighting function (applicability function) is used to have the highest contributions from samples close to the considered pixel and used a variance σ 2. A pixel of the high resolution image is computed from the pixels in a neighbourhood around it as: Where is an 1 vector containing the neighborhood pixels, is an matrix of basis functions sampled at the local coordinates of the pixels ,and is an weighting matrix containing the Gaussian weights sampled at the pixel coordinates. The first element of the 1 vector P ′ gives the interpolated pixel value. For neighbourhood selection, a circular region with radius four times the pixel distance of the high resolution image is used. Due to the nonuniform grid, the number of pixels in this region may vary depending on the position.
We perform bilinear interpolation for the luminance channel , as well as for each of the chrominance channels , and then we add luminance and chrominance together, which results in an intermediate high resolution colour image P . If we consider pixels of the high resolution image to be computed then P can be given as: Here P induces error / artifacts, as the final high resolution image are computed by fitting a polynomial surface to compute high resolution image and also scale of the applicability function plays a decisive role in the quality of interpolation such as using Low-order NC with a large applicability window cannot reconstruct small details in the image.

Image Reconstruction
Assuming P is the intermediate image obtained from Equation (8) and Q is the perfect demosaiced image; we formulate the image corruption process as: Where ῼ R → R is an arbitrary stochastic error process induced due to the polynomial surface fitting. Then, demosaicing learning objective becomes: In Equation (10) we see that the task is to find a function ′g′ that best approximates ῼ . We can now treat the image demosaicing problem in a unified framework by choosing appropriate ῼ in different situations.

Image Reconstruction Demosaic Auto-encoder [IRDA]
Let Q be the original data for i 1,2,3, … . . N and P be the corrupted version of corresponding Q .
Where φ P 1 exp P is the sigmoid activation function which is applied componentwise to vectors, h is the hidden layer activation, Q P is an approximation of Q and ω W, b, W ′ , b ′ represents the weights and biases. IRDA can be trained with various optimization methods to minimize the reconstruction loss.
After training IRDA, we move on to training the next layer by using the hidden layer activation of the first layer as the input of the next layer.

Neural Network Based Image Reconstruction Algorithm
In this section, we will describe the structure and optimization objective of the proposed model Neural Network Based Image Reconstruction Algorithm (NNIRA). Due to the fact that directly processing the entire image is intractable, we instead draw overlapping patches from the image as our data objects.
In the training phase, the model is supplied with both the corrupted error image patches P , for i 1,2,3, … . . N and the original patches Q . After training, IRDA will be able to reconstruct the corresponding clean image given any error observation.
To combine sparse coding and neural networks and avoid over-fitting, we train IRDA to minimize the reconstruction loss regularized by a sparsity-inducing term.
Where h . and Q . are defined in Equation (11) and Equation (12) respectively. Here Γ is the average activation of the hidden layer. We regularize the hidden layer representation to be sparse by choosing small Γ so that the KM divergence term will encourage the mean activation of hidden units to be small. Hence the hidden units will be zero most of the time and achieve sparsity.
After training of the first IRDA, we use h Q and h P as the clean and error input respectively for the second IRDA. Since h Q ) lies in a different space from Q , the meaning of applying ῼ . to h Q is not clear. We discarded P and used ῼ h Q )) as the error input. We then initialize a deep network with the weights obtained from K layered IRDAs. The network has one input layer, one output and 2K 1 hidden layers. The entire network is then trained using the standard back-propagation algorithm to minimize the following objective: Here we removed the sparsity regularization because the pre-trained weights will serve as regularization to the network [25], according to [26], during pre-training and fine tuning stages the loss functions are optimized with L-BFGS algorithm to achieve fastest convergence in our settings.

Hidden Layer Feature Analysis
During the training of image reconstruction demosaicing auto-encoders [IRDA], the error training data is typically generated with indiscriminately selected simple error distribution regardless of the characteristics of the particular training data. However, it is proposed that this process deserves a lot of attention in real world problems; the clean training data is in fact usually subject to error. Hence, if we tend to estimate the distribution of error and more using it to generate error training data, the ensuing IRDA will learn to be more robust to errors in the input data and produce better features. It can be suggested that training image reconstruction auto-encoders [IRDA] with totally different color intensity images that fit to specific situations can also improve the performance.
The resulting image obtained post neural network reconstruction is considered as the demosaiced image. In the next section the experimental study to evaluate the performance of the proposed demosaicing technique is presented.

Experimental Study and Performance Comparisons:
To validate our proposed system the most commonly used Kodak dataset [28] is considered in the experimental study and performance comparisons. The Kodak dataset consists of 24 images each having a high resolution of 768x512 and 24 bit color depth. The Kodak dataset considered in the experimental study and performance comparisons is shown in Figure 2. The proposed demosaicing technique is developed using MATLAB. To evaluate the performance of demosaiced images obtained in the proposed system, color-peak signal-tonoise ratio (CPSNR) defined in [23] is used. The CPSNR is computed using.

log
( 1 7 ) Where CMSE represents the Color Mean Square Error and is defined as: Where 1 and 2 are the intensity levels of original image and demosaiced images of height and width . ' ' value varies from 1 to 3 for the three color planes. The demosaicing performance results based on CPSNR obtained are tabulated in Table  1. In addition to the CPSNR the PSNR results considering the red, green and blue channel information is also presented. Based on the results it is observed that the proposed demosaicing technique is robust and effectively works on the wide variety of images present in the Kodak dataset. To compare the performance of the proposed demosaicing technique with the other state of art demosaicing techniques the average PSNR of the red, green, blue channels and the CPSNR results obtained is considered. The performance of the proposed demosaicing technique is compared with learned simultaneous sparse coding (LSSC) [29], Iterative Residual Interpolation (IRI) [19], multi-directional weighted interpolation and guided filter (MDWI-GF) [30], MLRI [31], similarity and color difference (SACD) based demosaicing [7], adaptive residual interpolation (ARI) [32] and multilayer neural network (NN) based demosaicing technique [22]. The NN technique proposed in [22] bears the closest similarity to the work proposed here. The average PSNR of the red, green, blue channel and CPSNR is considered for comparison. The results obtained are shown in Table 2 of the paper. Compared with multilayer neural network (NN) based demosaicing technique [22], the proposed demosaicing technique exhibits an improvement of 8.82 dB in average. An improvement of 4.1 dB and 4.23dB is reported based on our proposed technique against the LSSC [29] and ARI [32]. Based on the results it is evident that our proposed demosaicing technique outperforms the existing state of art demosaicing techniques.

Conclusion
In this paper a frequency domain based demosaicing technique is proposed. The luminance and chrominance information obtained from the frequency domain is extracted. A bilinear interpolation technique implemented on the luminance and chrominance information is used to partially derive the missing components and produce an intermediate image. The intermediate image is split into patches. The novelty of the proposed demosaicing technique is the adoption of neural network to learn from the local patches and estimate the missing components. A novel Neural Network Based Image Reconstruction Algorithm is presented in the proposed demosaicing technique. The experimental results and performance comparison results presented prove the robustness and demosaicing efficiency over existent state of art demosaicing techniques.Evaluation of the proposed demosaicing technique on additional image datasets is considered as the future of the work presented here.