3D gaze vector generation and assessment from eye tracking and motion capture data

A really challenging part of doing eye tracking and motion capture together is making sure you have synchronized data. Luckily, Lab Streaming Layer (LSL) solves this problem by taking arbitrary data inputs (e.g. EEG, eye tracking, motion capture, mouse tracking) and assigning timestamps from the operating system to each sample. Obviously there are many use cases in hand-eye coordination work, and I use it extensively in almost all of my work . Here, we use it to generate 3D gaze vectors (think of lasers shooting out of your eyes) to predict where someone is looking in motion-capture space based on their pupil positions. We came up with four different calibration procedures that involve a participant fixating on a wand while it moves through space (or remains stationary and the user moves their head while maintaining gaze). If this sounds interesting, check out the short paper we wrote for the ETRA ActiveEye conference. Recently, I wrote a longer-form version of the paper for Behavior Research Methods—check out the preprint on bioRxiv and the data repository which has lots of cool videos.

An example of the gaze vectors that we can generate from synchronized eye and motion capture data. Note that there are actually a bunch of gaze vectors here, and some of them perform quite poorly. The hotter the colour of the gaze vector, the closer the gaze vector is to the tip of the wand (i.e. low error) and the cooler the colour, the further way (i.e. high error).

Video game user interface interactions

In February 2020, I started a Mitacs Accelerate internship with BioWare. In case you’re not a gamer, BioWare is a video game company that has created many popular games, such as the Mass Effect series, Anthem, Dragon Age, and many other classics like Baldur’s Gate. My project concerned measuring interactivity with user interfaces using eye and mouse tracking during use. The goal was to extract key performance indicators; values that tell you something about the user’s experience when using the UI. I use an online service called Labvanced to create a task and collect data. You can see below an example of what kind of data I collect.

Typically, I use a monitor-mounted eyetracking solution (such as the excellent Gazepoint GP3-HD), but due to the world dying, we’ve shifted data collection to be entirely online, focusing on webcam-based eyetracking algorithms.

In this example, the participant was told to go figure out how to lower the music volume. Red denotes mouse position, whereas green is a Tobii eyetracker and orange is a webcam-based eyetracking algorithm. We wanted to compare the two, but to me the webcam one looks pretty decent. You really get a sense of where they are paying attention to when completing the task.

Decision making modeled as a reinforcement learning agent versus humans

I’m really interesting in foraging. I find it fascinating that an animal can develop an optimal strategy when foraging for food in the bush, or that when we go to the supermarket to grab some tomatoes, we don’t spend 6 hours rifling through the bin to find the perfect tomato. We can model these types of behaviours using reinforcement learning frameworks such as TensorFlow and JAX. I am working with Nathan Wispinski to create models of decision making in RL agents so that we can directly compare them to human participants. Due to the pandemic, we have been using MTurk to collect our participants, again using Labvanced as the backbone for our collection. We are hoping to demonstrate human-level control of behaviour in the RL agents (instead of the typical mantra of outperforming humans) because we believe this is a useful framework to better understand human foraging behaviours. See the video below for an example of what a typical foraging trial looks like in humans.

Video representation of what a trial looks like. The grey cubes are human participants. They don’t all actually compete at the same time, we’ve just overlaid them to show what all of the participants’ strategies look like.

Delayed reaching retinotopy in EEG

A really cool paper demonstrated that performing a delayed action resulted in re-recruitment of early visual cortical areas at the time of execution. Typically, it is thought that dorsal visual information is lost in the matter of a few seconds, but in this paper, it was shown that even a delay of ~18s (which was necessary because they were measuring the BOLD response) still resulted in early visual cortical areas—the same areas that encode crucial information for grasping behaviours—being reactivated. Among these early cortical areas is V1, which we can record using EEG. We designed a task that presents a stimulus in one of four quadrants on a screen, which will then either disappear and reappear several seconds later, or remain on the screen. The goal is to move your hand and touch (on the screen) where the target was. We record hand movements using a few OptiTrack cameras, gaze using a Tobii eyetracker, and EEG using a 256 channel EGI net. Again, this project was done with Nathan Wispinski.

Here’s a neat video demonstrating what the task looks like. Top left is the EEG data. Bottom left is a video of Nathan doing the task. Right side of the screen is a recreation of the task. The yellow translucent circle on the screen is where Nathan is looking (because we want to ensure he’s always looking at the cross throughout the task) and the blue dot is the 3D position of his finger. We can do this for any trial.
In general, this is what we expect to find. If we create source voxels throughout the brain (thanks to a handy MRI scan Nathan had done a few years ago), we can parcellate them into rough areas that correspond with retinotopic coordinates. Left: looking at the blue dot would create a pattern of activation in the blue voxels. Right: the expected source power in the regions of interest. When seeing the target for the first time, power should be high. We predict that power should increase again just prior to the motor plan executing.