What Are You Looking At? Using Eye Tracking Glasses to Monitor Toddler Attention in Natural Learning Situations.
In ACM Symposium on Eye Tracking Research and Applications, pp. 1-4. 2021
Investigating eye gaze offers a window into the learning mind as eye movements are linked to attention and cognitive processing (Yarbus, 1967; Hyönä et al., 2003). Because eye tracking is non-invasive and does not require and overt behavioral response, it is a suitable tool to use with infants and toddlers (for a review on the use of eye tracking in infancy research see Gredebäck et al., 2010). Researchers developed widely applied paradigms in which infants’ looking is used to infer the discrimination, categorization or recognition of different stimuli (Michnick Golinkoff et al., 2013; Oakes, 2010). A prominent example in research on language development is the looking while listening procedure (Fernald et al., 2008), in which a child is presented with two images side-by-side of which one is labelled. Measuring how long it takes the child to orient towards the labelled object is taken as a measure of word recognition. Within the different paradigms employed, researchers assume that gaze direction coincides with attention and that it informs us about what children are interested in, what information they pick up, which predictions they make and which expectations they have (for a critical review on the conceptual foundations of preferential looking paradigms see Tafreshi et al., 2014). Most eye tracking studies to date use remote, stationary eye trackers, providing objective measures of eye gaze in a pre-defined, calibrated space. They mostly present stimulus material such as pictures and videos on screen and monitor infants’ eye movements and/or pupil dilation to assess their perception and understanding of these stimuli. This research allows for tight experimental control and provides important insights into the specifics of cognitive processing. However, the controlled set-up comes at a cost: it allows to automatically assess how infants explore a predefined visual space, but gaze mapping is limited to this visual space and the child is required to keep a certain distance and angle to the eye tracker to allow automatic detection of the pupil. Also, when the child grabs an object or points at something, gaze might be lost due to the arm occluding the eye tracker. Such constraints restrict the use of remote eye tracking in behavioral research. In addition, studies usually use very controlled – and compared to the real world – reduced stimuli. This might limit the ecological validity of the results obtained (for a detailed discussion see Ladouce et al., 2017). Recent approaches further highlight the embodiment of cognition in development (Laakso, 2011) and the role of social interaction for cognitive processing and learning (Csibra & Gergely, 2009). Against this background, it seems necessary to investigate infants’ processing in more naturalistic and complex settings to fully describe and understand the mechanisms and processes of cognitive development. Yet, recording eye gaze from children in natural situations is a difficult endeavor. While recent developments include eye tracking glasses which allow adult participants to freely move around and interact, the commercially available systems do not fit on an infant or toddler head. Some researchers have turned to small head-mounted cameras (e.g., Smith et al., 2011) to study infant cognition in interaction with objects and people. In this approach, the child’s field of view is recorded and from what is visible in the scene, it is inferred what the child is looking at. This research has, for instance, illustrated that children are most likely to learn a novel label when it is uttered at a time when its referent is dominant in the child’s view (Yu & Smith, 2012). However, head-mounted cameras do not track the child’s gaze. Image processing algorithms allow to automatically detect, e.g., objects of interest and facesin the scene and to calculate their relative (visual) salience. But if several objects (or faces or both) are visible in close proximity, it remains unclear what exactly the child is fixating at a specific point in time. Neither can it be determined at which part of an object or face the child is looking, for instance whether the child is attending to the eyes or the mouth (as, e.g., investigated using stationary eye tracking in Lewkowicz & Hansen-Tift, 2012). Other researchers have therefore developed mobile eye tracking systems employing similar components as adult eye tracking glasses but attaching them to a light-weight headgear (Franchak et al., 2011). One major challenge, however, remains in these systems: How can we determine that a look falls within a specific area of interest? In screen-based systems, areas of interest can be defined in terms of fixed coordinates because the eye tracker and the screen on which stimuli are presented are stationary. In contrast, the field of view constantly changes in mobile systems due to movement of the participant (or movement of objects/people in the scene). So far, this mapping problem has been solved through manual coding of scene data – an approach similar to the coding of looking data from video cameras that is time- and resource-intensive.