Low-Cost Based Eye Tracking and Eye Gaze Estimation

The costs of current gaze tracking systems remain too high for general public use. The main reason for this is the cost of parts, especially high-quality cameras and lenses and cost development. This research build the low cost based for gaze tracking system. The device is built by utilizing of modified web camera in infrared spectrum. A new technique is also proposed here in order to detect the center pupil coordinate based on connected component labeling. By combination the pupils coordinate detection method with third order polynomial regression in calibration process to determine the gaze point. The experiment results show our system has an acceptable accuracy rate with error pixel 0.39 in visual degree.


Introduction
Gaze tracking has been used for many decades as a tool to study human cognitive process including reading, driving, watching commercials, and other activities [1].With the advent of personal computers, potential integration of such systems has been considered only recently in 1991 [2].Successful attempts to employ gaze tracking as user interface were made to allow users with movement disabilities to type by looking at virtual keyboard [3], or for mouse pointer control [4].Systems of this kind have also been used by individuals without any disabilities to enhance performance [5], [6].In recent years, gaze tracking is widely used in the areas of intelligent control [7], virtual reality, video games, robotics, human computer interaction, eye diseases diagnosis, human behavior studies, etc [8]. Lee et al used gaze tracking system for controlling IPTV [9].
Gaze Tracking system detect the eye location in image and estimate the gaze path by measurement of point of gaze (POG).The eye position is commonly measured using the pupil or iris center.The detected eyes in the images are used to estimate and track where a person is looking in 3D or alternatively determining the 3D line of sight [10].Gaze tracking system is divided into head mounted or wearable-camera-based or intrusive system which required direct contact with the eye, as in [11], [12] and head-free or remote based or nonintrusive system which avoid any physical contact with the user as in [13], [14], [15].
The continuing absence of consumer-grade eye-tracking human computer interface technology for general public use is result of the high price (cost) of eye tracking technology, and intrusiveness of such systems, despite the fact that technologies allowing so have existed for many years [16].The main reason for this is the cost of parts, especially high-quality cameras and lenses, the cost of development, and the relatively small market [10].relatively low cost systems have been investigated The design work described herein modern, low-cost systems.The three most significant contributions of this research are the design of hardware, in this case a webcam blob/connected component l instead of iris detection in the visible spectrum, and use of higher dimensional polynomials to calibrate and map the detected gaze position to the position on the computer screen.
In a gaze tracking system, images of the eye are taken by a camera and sent to an image processing system.This image data is a picture of the user's eye from a specific vantage point.The vantage point can be close to the user's eye, as in a head mounted device or away, as in a device on a table near the user.For this project, the design team has used a head mounted camera, as will be discussed later.The image is processed to determine the location of the user's pupil within the image.The coordinates of t are passed through a specialized set of algorithms to accurately position a cursor on a screen displayed in front of the user.The cursor position on the screen represents where the user is looking on the display.This creates an open loop tracking system where a cursor follows where the user is looking on the screen.

Research Method 2.1. General System Overview
In general the system in Figure 1.The first module is hardware design, which is modification of the webcam in order to have worked in infrared spectrum in such way so that the pupil relative easier to be detected and tracked.

Figure 1. General overview of
The second module is software design, which is the main purpose are for detect and track the pupil coordinate using thresholding and connected component labeling algorithm, transform that coordinate to monitor coordinate through calibration a polinomial regression, and the last is testing process to measure accuration rate in term of error pixel between gaze coordinate result from calaculation of the system and aimed monitor coordinate.
, August 2011 : 377 -386 cameras and lenses, the cost of development, and the relatively small market [10].relatively low cost systems have been investigated as in [17], [18].
The design work described herein shows the accuracy that can be obtained using cost systems.The three most significant contributions of this research are the design of hardware, in this case a webcam [19] for worked in the infrared spectrum, use of blob/connected component labeling algorithm to detect the pupil in the infrared spectrum instead of iris detection in the visible spectrum, and use of higher dimensional polynomials to calibrate and map the detected gaze position to the position on the computer screen.
tracking system, images of the eye are taken by a camera and sent to an image processing system.This image data is a picture of the user's eye from a specific vantage point.The vantage point can be close to the user's eye, as in a head mounted device or away, as in a device on a table near the user.For this project, the design team has used a head mounted camera, as will be discussed later.The image is processed to determine the location of the user's pupil within the image.The coordinates of the detected center of the user's pupil are passed through a specialized set of algorithms to accurately position a cursor on a screen displayed in front of the user.The cursor position on the screen represents where the user is s creates an open loop tracking system where a cursor follows where the user is looking on the screen.

General System Overview
In general the system is built in this research consist of two main module in Figure 1.The first module is hardware design, which is modification of the webcam in order to have worked in infrared spectrum in such way so that the pupil relative easier to be detected Figure 1.General overview of the system The second module is software design, which is the main purpose are for detect and track the pupil coordinate using thresholding and connected component labeling algorithm, transform that coordinate to monitor coordinate through calibration and point transformation process using polinomial regression, and the last is testing process to measure accuration rate in term of error pixel between gaze coordinate result from calaculation of the system and aimed monitor ISSN: 1693-6930 cameras and lenses, the cost of development, and the relatively small market [10].Only recently shows the accuracy that can be obtained using cost systems.The three most significant contributions of this research are the for worked in the infrared spectrum, use of abeling algorithm to detect the pupil in the infrared spectrum instead of iris detection in the visible spectrum, and use of higher dimensional polynomials to calibrate and map the detected gaze position to the position on the computer screen.
tracking system, images of the eye are taken by a camera and sent to an image processing system.This image data is a picture of the user's eye from a specific vantage point.The vantage point can be close to the user's eye, as in a head mounted device or further away, as in a device on a table near the user.For this project, the design team has used a head mounted camera, as will be discussed later.The image is processed to determine the location he detected center of the user's pupil are passed through a specialized set of algorithms to accurately position a cursor on a screen displayed in front of the user.The cursor position on the screen represents where the user is s creates an open loop tracking system where a cursor follows where built in this research consist of two main modules as illustrated in Figure 1.The first module is hardware design, which is modification of the webcam in order to have worked in infrared spectrum in such way so that the pupil relative easier to be detected The second module is software design, which is the main purpose are for detect and track the pupil coordinate using thresholding and connected component labeling algorithm, transform that nd point transformation process using polinomial regression, and the last is testing process to measure accuration rate in term of error pixel between gaze coordinate result from calaculation of the system and aimed monitor TELKOMNIKA Low-Cost Based Eye Tracking and Eye Gaze Estimation

Hardware Design
The hardware design of this system consist of 4 main process : 1. Infrared filter removal Every webcam in general equipped with infrared filter for blocking the infrared light and allowing visible light.Because in this research will use infrared spectrum app webcam must be modified to remove the infrared filter as shown in Figure 2. useful for eye trackers, mainly because it is not only invisible to the user but also it can be used for controlling light conditions, obtainin estimation [8].The next process is to attach a filter to block visible light and let only infrared light into the webcam as shown in Figure 3.The filter is film negative already to be printed and take the black part of that film, usually in the beginning and the end of roll film.Price of the original filter to block the visible light is very expensive, so film purposes of this research is to built eye tracking and eye gaze tracking system cost component.

Making of infrared light source
After the webcam has been for the webcam.In this research infrared light source is built using 1 infrared LED which connected into 2 battery each of them 1.5 volt, resistor 33 ohm, and 1 switch to turn on or off the infrared LED.This research use only 1 LED after consider that webcam will be positioned very close with the eye, so 1 LED is enough for infrared source.

Finalizing
The last step to design hardware is linking up together all the components in a helmet as shown in Figure 5.This helmet later will have weared by the users of the system.

Cost Based Eye Tracking and Eye Gaze Estimation (I Ketut Gede Darma Putra
The hardware design of this system consist of 4 main process : in general equipped with infrared filter for blocking the infrared light and allowing visible light.Because in this research will use infrared spectrum app webcam must be modified to remove the infrared filter as shown in Figure 2. useful for eye trackers, mainly because it is not only invisible to the user but also it can be used for controlling light conditions, obtaining higher contrast images, and stabilizing gaze

Attachment of visible light filter
The next process is to attach a filter to block visible light and let only infrared light into the webcam as shown in Figure 3.The filter is film negative already to be printed and take the usually in the beginning and the end of roll film.Price of the original filter to block the visible light is very expensive, so film negative is used instead, because the main purposes of this research is to built eye tracking and eye gaze tracking system Making of infrared light source has been modified, the next step is supplies the infrared light source In this research infrared light source is built using 1 infrared LED which to 2 battery each of them 1.5 volt, resistor 33 ohm, and 1 switch to turn on or off the infrared LED.This research use only 1 LED after consider that webcam will be positioned very close with the eye, so 1 LED is enough for infrared source.
The last step to design hardware is linking up together all the components in a helmet as shown in Figure 5.This helmet later will have weared by the users of the system.

Infrared filter Screw place
Screw place I Ketut Gede Darma Putra) 379 in general equipped with infrared filter for blocking the infrared light and allowing visible light.Because in this research will use infrared spectrum approach, so that the webcam must be modified to remove the infrared filter as shown in Figure 2. Infrared light is useful for eye trackers, mainly because it is not only invisible to the user but also it can be used g higher contrast images, and stabilizing gaze 1000, (b) Webcam after the The next process is to attach a filter to block visible light and let only infrared light into the webcam as shown in Figure 3.The filter is film negative already to be printed and take the usually in the beginning and the end of roll film.Price of the original filter instead, because the main purposes of this research is to built eye tracking and eye gaze tracking system based on low modified, the next step is supplies the infrared light source In this research infrared light source is built using 1 infrared LED which to 2 battery each of them 1.5 volt, resistor 33 ohm, and 1 switch to turn on or off the infrared LED.This research use only 1 LED after consider that webcam will be positioned very The last step to design hardware is linking up together all the components in a helmet as shown in Figure 5.This helmet later will have weared by the users of the system.

Software Design
There are two main subsystems to the software.The first is the image processing application, which processes the image and locates the center of the pupil.The second is the calibration and point transformation subsystems, which do the mapping from the center coordinate of the eye pupil to the screen coordinate.C++ and OpenCV library.The intricacies of these subsystems will be described in more detail in the following sections.

Detection of the center coordinate of
The pupil may be darker than their surroundings and thresholds may be applied if the contrast is sufficiently large.Yang et al and Stiefilhagen at al introduce an iterative threshold algorithm to locate the pupils by looking for two dark regions constraints using a skin-color model.Their method is limited by the results of the skin model and it will fail in the presence of other dark regions such as eyebrows and shadows [10].Yang et al applied the ellipse fitting algorithm to fit a standard ellipse or circle based on the coordinates of pupil edge pixels.The center of the ellipse or circle is the center of the pupil [7].This research use simple technique to detect the pupil center using connected compon labeling.
The steps to detect the center of pupil is used here begins by capturing a eye image grayscaling process convert the RGB color to gray color space.A smoothed using Gaussian filter reduce sharp edges, aiding the pupil detection system.Once all connected component have been located, the system calculates several parameters of each blob such as area, aspect ratio, roundness, and more.These parameters are compared to experimentally determined values for a pupil and connect discarded based on them.Due to the nature of the human eye and surrounding features, there will never be more than one connected component that fits all parameters for a human eye.Thus, the system selects the correct connected componen center coordinate of this connected component.

Cost Based Eye Tracking and Eye Gaze Estimation (I Ketut Gede Darma Putra
The intricacies of these subsystems will be described in more detail in

Detection of the center coordinate of the pupil
The pupil may be darker than their surroundings and thresholds may be applied if the contrast is sufficiently large.Yang et al and Stiefilhagen at al introduce an iterative threshold algorithm to locate the pupils by looking for two dark regions that satisfy certain anthropometric color model.Their method is limited by the results of the skin model and it will fail in the presence of other dark regions such as eyebrows and shadows [10].
se fitting algorithm to fit a standard ellipse or circle based on the coordinates of pupil edge pixels.The center of the ellipse or circle is the center of the pupil [7].This research use simple technique to detect the pupil center using connected compon to detect the center of pupil is used here are shown in Figure eye image frame from the camera via OpenCV's camera subsystem.grayscaling process convert the RGB color to gray color space.After that the grayscale smoothed using Gaussian filter to remove any noise in image.The smoothing also helps to reduce sharp edges, aiding the pupil detection system.Once all connected component have been located, the system calculates several parameters of each blob such as area, aspect ratio, roundness, and more.These parameters are compared to experimentally determined values for a pupil and connected componentes are discarded based on them.Due to the nature of the human eye and surrounding features, there will never be more than one connected component that fits all parameters for a human eye.Thus, the system selects the correct connected component.The algorithm then calculates the center coordinate of this connected component.

381
The intricacies of these subsystems will be described in more detail in The pupil may be darker than their surroundings and thresholds may be applied if the contrast is sufficiently large.Yang et al and Stiefilhagen at al introduce an iterative threshold that satisfy certain anthropometric color model.Their method is limited by the results of the skin-color model and it will fail in the presence of other dark regions such as eyebrows and shadows [10].
se fitting algorithm to fit a standard ellipse or circle based on the coordinates of pupil edge pixels.The center of the ellipse or circle is the center of the pupil [7].This research use simple technique to detect the pupil center using connected components are shown in Figure 6.The process frame from the camera via OpenCV's camera subsystem.The fter that the grayscale image is to remove any noise in image.The smoothing also helps to of the center of pupil, (a) capture frame from camera, (b) grayscale image, (c) Gaussian blur image, (d) binary image, (e) component labelling image, (f) output image the smoothed image to obtain the binary image.The pupil will be .However, there may be other black component as well.These represent areas of the image of almost exactly the same shade and to locate all black components in the image and determine which is representative of the pupil.A connected component is defined as a group of pixels with values within a certain range.The connected Once all connected component have been located, the system calculates several parameters of each blob such as area, aspect ratio, roundness, and more.These parameters ed componentes are discarded based on them.Due to the nature of the human eye and surrounding features, there will never be more than one connected component that fits all parameters for a human eye.
t.The algorithm then calculates the

Calibration and Point Transformation
Calibration is necessary due to the fact that a screen monitor is a flat n by m pixel rectangle while the human eye is not.Mapping is center of the pupil to the coordinates on the display.Calibration must be done every time the system is restarted due to variations in use.The eye will not be in the same location relative to the screen every time the same user wears it, also, different users with different eye and face shapes will require a new calibration.In this research, the calibration process is done using 9, 16, or 25 pixel locations, by asking the user to 'look at the dot' in monitor.can be shown in Figure 7.By using the center location of the eye when looking at those known pixel locations the coefficients of the calibration equation are determined using polynomial regression similar with [8], [20], [21].This paper used first (equation ( 1) and ( 2)), second (equation ( 3) and (4) (equation ( 5) and ( 6)) order polynomial regression.Polynomial regression is a statistical technique used to approximate correlation of variables.Shown below the polynomial regression formula: ‫,ݔܲ(‬ ‫)ݕܲ‬ and ‫,ݔܵ(‬ ‫)ݕܵ‬ represents center pupil coordinate and target (screen monitor) coordinate respectively.With 9,16, or 25 points sample/training points in the monitor will produce 9,16, or 25 equations for each of this ( 1), ( 2), ( 3), (4), 5) and ( 6) equations.To obtain regression coefficients is used least square method.This method will convert the equations in matrix form (7) and ( 8) (here only shown sample for first order polynomial from eq , August 2011 : 377 -386

Calibration and Point Transformation
Calibration is necessary due to the fact that a screen monitor is a flat n by m pixel rectangle while the human eye is not.Mapping is required to transform the coordinates of the center of the pupil to the coordinates on the display.Calibration must be done every time the system is restarted due to variations in use.The eye will not be in the same location relative to time the same user wears it, also, different users with different eye and face shapes will require a new calibration.In this research, the calibration process is done using 9, 16, or 25 pixel locations, by asking the user to 'look at the dot' in monitor.The pixel locations (b) (c) ].This paper used first (equation ( 1) and ( 2)), second (equation ( 3) and (4) (equation ( 5) and ( 6)) order polynomial regression.Polynomial regression is a statistical technique used to approximate correlation of variables.Shown below the polynomial regression ) represents center pupil coordinate and target (screen monitor) coordinate respectively.With 9,16, or 25 points sample/training points in the monitor will produce 9,16, or equations for each of this (1), ( 2), ( 3), (4), 5) and ( 6) equations.To obtain regression coefficients is used least square method.This method will convert the equations in matrix form (7) and ( 8) (here only shown sample for first order polynomial from equation 1 and 2).

ISSN: 1693-6930
Calibration is necessary due to the fact that a screen monitor is a flat n by m pixel required to transform the coordinates of the center of the pupil to the coordinates on the display.Calibration must be done every time the system is restarted due to variations in use.The eye will not be in the same location relative to time the same user wears it, also, different users with different eye and face shapes will require a new calibration.In this research, the calibration process is done using 9, The pixel locations (b) (c) ].This paper used first (equation ( 1) and ( 2)), second (equation ( 3) and ( 4)), or third (equation ( 5) and ( 6)) order polynomial regression.Polynomial regression is a statistical technique used to approximate correlation of variables.Shown below the polynomial regression (3) (4)
KoefY1, KoefY2, KoefY3 are the regression coefficients that will be computed.The regression coefficients are calculated using Gauss Elimination methods.The next step after the coefficients are obtained is transform or mapping the pupil coordinate to screen coordinate to get the gaze point (Gx,Gy).This point can be achieved by multiplication of pupil coordinate output from detection of center pupil with regression coefficients output from calibration process.The first (equation ( 9) and ( 10)), second (equation ( 11) and ( 12)), and third (equation ( 13) and ( 14)) order polynomial gaze point can be computed as below: This gaze point can be used to control the movement of mouse cursor using our eye.

Results and Analysis
To obtain performance of this system, the testing used data from 10 users.Each user do 9 model testing (combination of 9, 16, 25 sampel point with first, second and third order polynomial).From each model testing of each user is calculated the gaze points and calculate the distance (error pixels) between the point and 36 testing points in the monitor screen using euclidean distance [22] (see Figure 8).Average (avgError) and maximal (maxError) error distance from 36 distances from each model testing of each user are computed.To measure which model has the best performance, four indicators below are computed using 10 users in offline mode: -average distance/error pixel (finalAvgError) -average maximal distance/error pixel (finalMaxError) -standard deviation between avgError and finalAvgError (finalStdDeviasiAvgError) -standard deviation between MaxError and finalMaxError (finalStdDeviasiMaxError).   385 and 3, in model 3 which uses third order give the worst accuracy, this is because the number of sample points uses in model 3 are 9 points which is that sample points smaller than the number of regression coefficients result uses third order (10 coefficients).Main prerequisite from polynomial regression method is the number of sample points has to be greater than the number of regression coefficients result.5. Choise of number of order in polynomial regression has direct effect to the number of sample points in calibration process.The higher order is used, so the greater number of sample points is required.

Conclusion
We have been developed a low cost device for gaze tracking system.The device is built by utilizing of modified webcam in infrared spectrum.A new technique is proposed in this paper to detect the center pupil coordinate based on connected component labeling.The first, second, and third polynomial regression also tried in the experiments to determine the point of gaze.By combination of center pupil coordinate detection method with third order polynomial regression in determining the gaze point, the experiment results show our system has an acceptable accuracy rate with error pixel range between 26.2 to 14.2 pixel or 0.70 o to 0.39 o in visual degree.The applied of this system on specific application for general public domain is interesting further research area.

Figure 2 .
Figure 2. Infrared filter removal, (a) Webcam Microsoft Lifecam VX screw is closed, (c) Infrared filter is remov filter removal, (a) Webcam Microsoft Lifecam VX-1000, (b) Webcam after the screw is closed, (c) Infrared filter is removed

Figure 3 . 4 Figure 5 .
Figure 3. Attachment of visible light filter, (a) film negative, (b) Attachment of film negative to Figure 4. Infrared LED, resistor, switch and batteries

Figure 6 .
Figure 6.Detection of the center of pupil, (a) capture frame from camera, (b) grayscale image, (c) Gaussian blur image, (d) binary image, (e) component labelling image, (f) output image of pupil, (a) capture frame from camera, (b) grayscale image, (c) Gaussian blur image, (d) binary image, (e) component labelling image, (f) output image thresholding the smoothed image to obtain the binary image.T clearly visible as a black component in this image.However, there may be other black component as well.These represent areas of the image of almost exactly the same shade and color as the pupil.The connected component labeling step exists here to locate all black components in the image and determine which is representative of the pupil.component is defined as a group of pixels with values within a certain range.The connected component labeling algorithm used is an open source add-on for OpenCV.

Figure 8
represents target points, while black cross sign represents gaze point obtained from testing.The Figure is obtained from model 9 which combine 25 sample points with third order regression polynomial.

Figure 8 .Figure 9 .
Figure 8.(a) 36 point testing grid, (b) testing result from combination of 25 sample point and