The paper investigates the use of gaze along with deictics and embodied pointing to accomplish reference and joint attention in naturally occurring social interaction. It assumes that deixis, in its primordial use in face-to-face interaction, is an embodied phenomenon that involves gestural pointing as well as visual perception, thus giving rise to recurring gaze practices of the participants. The analysis draws on a model of the interactional organization of deictic reference and joint attention that serves as a sequential framework for investigating the functions of eye gaze. The analysis focuses on two meta-perceptive practices: gaze following and gaze monitoring. It shows that the use of these practices in naturally occurring social activities is context dependent, positionally sensitive, tied to participant roles, and temporally fine-tuned to the stream of the participants’ verbal and embodied conduct. The sequential analysis of these practices further documents that meta-perceptive gaze practices contribute to the constitution of joint attention as mutually known by the participants. The data for this study were recorded with two pairs of mobile eye tracking glasses and an external camera. Methodologically situated within the framework of conversation analysis and interactional linguistics where video recording is used, the study breaks new ground by employing a technology almost exclusively applied in experimental frameworks to record ordinary social activities “in the wild.” In striving for ecologically valid and precise eye gaze data, it also contributes to a refinement of concepts developed in experimental paradigms by adapting them to qualitative research within the field of multimodal conversation analysis and interactional linguistics.