Do my eyes deceive me?

CSCI 1430: Introduction to Computer Vision



We will try to improve Webgazer—a Web-based eye tracker. You are free to try to improve this in any way, so long as it's still computer vision, e.g., improve the face detector, the geometric face tracker, the blink detection, the eye modeling, the label data from interaction logs, the ridge regression—any part, or any potentially-useful part. We will also give you a dataset upon which to train a regressor, and you are free to use any regression method. Further, you are free to write your improvements in Javascript, and attempt to maintain real-time performance; or, use a different language and investigate an offline approach ("...towards real-time eye tracking").

Slide deck from class: PPTX | PDF


The Frames Dataset

A set of 98,182 pre-extracted video frames, along with more accurate Tobii eye tracker gaze estimates (i.e., labels), WebGazer estimates (to beat), and clmtracker points (facial landmarks), for 33 participants. These data are split per participant, and then per task, with metadata stored in gazeEstimates.csv. The data format is explained implicitly through

The course staff have split this into training and testing. If you'd like a validation set, please split training again. Some of the test set is easy, e.g., the same participant at a slightly different time, and some of the test set is hard, e.g., an unseen participant. Be careful to load the correct files—training on test will give great results ;)

The Full Dataset

The original captured videos and interaction logs. This is everything, with a lot of potential data sources, e.g., mouse motion and clicks, typing cues, and screen recordings. NOTE: We've already extracted useful information into the Frames Dataset, so this is if you want to delve deeper, or explore some other angle.


To recreate the Frames Database, you can launch and then visit http://localhost:8000/webgazer_generateFramesDataset.html in a browser. Looking at this code should help you to extract any other data sources or features.

[Staff record keeping] This is the 'Curated Dataset', with original webM videos from the 'Main Dataset'.


The frames dataset: Parse the gazeEstimates.csv, then compute a metric between the Tobii and WebGazer results (e.g., Euclidean screen space difference), then compare the same metric to your approach. In many projects which use this dataset, you won't need to re-evaluate the performance within your improved live WebGazer.js; however...

The full dataset: If you've made changes to WebGazer.js, and want to test the performance difference to original WebGazer, then we must rerun it on the full dataset. We've written a helper application for you to do this. It needs write access, so it must be run locally. Steps:

  1. Execute, and launch http://localhost:8000/webgazer_testAccuracy.html in a browser. This will launch a webserver which runs through the full dataset, and at each instance waits for the results of unmodified WebGazer.js and compares it to the Tobii data. Then, it writes out the differences to .csv files per video, and aggregates performance across videos. This data is stored in a folder called 'YYMMDD_mmhhss_AccuracyEvaluation'—make sure to remember which folder is which execution. The process will take a few hours, because it has to run through all dataset videos in real time.
  2. Edit webgazer_testAccuracy.html to load your modified version of WebGazer.js with the changes you wish to test.
  3. Execute again, and launch your edited http://localhost:8000/webgazer_testAccuracy.html. Compare the two 'runs' in the two folders by parsing the .csv files and computing statistics!

NOTE: Closing webgazer_testAccuracy.html or switching to another tab has caused the site to stop progressing on some browsers. One option to avoid this is to make a small one-tab browser window and leave it in the corner of your workspace.

Think of the application as an examples of how to achieve this kind of comparison. Feel free to build upon it and make any changes that you need. You'll need to edit a local copy, of course. For instance, you could compare the error in only a subset of the videos to speed things up and give you a faster iteration.


The Brown CS Grid is a collection of computers that can be accessed via any department machines. Please note that these machines are used for CS Research and as so please be respectful of the resources when using them. The GRID has both GPUs and CPUs available to use. If you need any Python libraries that do not exist on department machines either create a virtualenv or install the libraries in your home folder using the --user option for pip.


For those of you completing 1430 as a capstone, we expect you to put a similar amount of additional effort into the final project, in line with the other projects. Given the wide scope of the projects, it is difficult to tie down exactly what this might be, but feel free to talk to James to discuss any plans.


If you need anything, please ask. Say, if you wish to extract some new data from the Full Dataset but don't know how to do it, or if you think there's a better way to extract frames or Tobii data, or if you need more documentation—please just ask. We don't know quite what you will attempt or need.


Project developed by Alexandra Papoutsaki, Yuze He, Aaron Gokaslan, Jeff Huang, and James Tompkin.