Exploring the temporal alignment in chimpanzees’ vocalizations and gestures

Researchers involved: Chiara Zulberti, Katja Liebal, Federica Amici (Compositional Structures in Chimpanzee Gestural Communication), Paula Sánchez-Ramón, Alina Gregori, Pilar Prieto, Frank Kügler (MultIS)

In this study, we want to explore multimodal prominence in terms of gesture-vocalization temporal alignment as a possible feature of chimpanzee communication. It has been found for human communication that prominent landmarks of speech and gesture (such as accented syllables and gesture strokes) temporally associate with each other in speech (cf. Shattuck-Hufnagel et al., 2007; Loehr, 2012; Esteve-Gibert & Prieto, 2013) and that this coordination has communicative effects (Ebert et al., 2011; Kügler & Gregori, 2023). This temporal alignment between prosody and gesture serves to reinforce the emphasized elements of speech, aiding overall language comprehension.

In non-human animal communication, species closely related to humans, such as chimpanzees, have also been shown to use combinations of gestures and vocalizations (Wilke et al., 2017; Hobaiter et al., 2017). However, their temporal coordination has not been explored, thus making it unclear to what extent chimpanzee multimodal production resembles human temporal alignment and whether this linguistic phenomenon evolved in the hominin lineage—possibly driven by its linguistic function—or if its evolutionary origins date further back.

This collaboration therefore aims at filling this gap by exploring the temporal alignment patterns between gestures and vocalizations in chimpanzee communication. Specifically, we aim to answer the following research questions: (1) Does the temporal span of the vocalization overlap with the temporal span of its target gesture? and (2) Which specific part of the vocalization overlaps with the target gesture?

To this aim, we will analyse observational data from a video dataset collected by CZ comprising 45 hours of video recordings of semi-wild chimpanzees during naturally occurring interactions at the Chimfunshi Wildlife Orphanage in Zambia. Gestures occurring in the dataset and their apices of tension have already been annotated as part of CZ’s main ViCom project. Within the collaboration, we therefore aim to annotate vocalizations and segment them into smaller acoustic units (syllable-like structures), to then assess the positioning of the target gestures in relation to the vocal landmarks. We define a syllable-like unit as a single consecutive sound demarcated by a silence, which can be identified both aurally and through spectrogram analysis.

We will focus the scope of the study on pant hoot vocalizations. These calls are particularly suitable for the study of prominence because they are composed of multiple acoustically different consecutive units and have traditionally been described as unfolding in three phases: buildup, climax, and letdown (Slocombe & Zuberbühler, 2010). Thus, they provide promising features for the analysis of acoustic components and their relation to gesture—especially for investigating acoustic anchors for gestures due to the presence of multiple consecutive units.

By exploring the interface between vocalizations and gestures in chimpanzee communication, this study will advance our knowledge of multimodal strategies in non-human animal species. This endeavour will also contribute to linguistic research by uncovering potential analogies between human and non-human communication systems and by shedding light on the evolutionary origins of temporal alignment.

To achieve these research objectives, the integration of skills from different disciplines is required—an effort uniquely supported by the diversity and collaborative opportunities provided by ViCom’s environment. In this context, we believe the proposed project aligns well with ViCom’s general aims. Moreover, this collaboration will enable the investigation of further research questions on gesture-vocalization interactions in non-human species. An interesting follow-up, for instance, would be assessing acoustic and gestural prominence of the multimodal landmarks in chimpanzee communication using models that have been developed for human prosodic and gestural prominence (see Gregori, Sánchez-Ramón, Prieto & Kügler, 2024 for an application).