Researchers involved: Chiara Zulberti (Compositional structures in chimpanzee gestural communication, University of Leipzig), Šárka Kadavá (FLESH, ZAS Berlin)
Researchers studying (multimodal) communicative behaviour must often rely on extensive manual annotations of data. This process can be extremely time-consuming, especially when dataset sizes are scaled up for statistical power. The considerable time investment is not only due to the extraction of data itself but also to the training of annotators, which needs to meet a certain level of inter-observer reliability. Moreover, manual annotations often fail to comprehensively capture the intricate dynamics of bodily movements and their significance in social interactions.
However, the recent development of computer vision tools used to track a subject’s body and deep neural networks trained to detect objects might solve some of these limitations and facilitate the study of both human and non-human animals. In fact, computational tools based on pose estimation, like DeepLabCut (Mathis et al. 2018) or LabGym (Hu et al., 2023), can detect and quantify the motion patterns of observed individuals, enabling researchers to use this information for the classification and analysis of different types of behaviour (e.g., self-adaptor vs. gesture).
Researchers interested in human communication are using OpenPose (Cao et al., 2017) and similar pose-estimation tools to study human movement during different types of tasks, including social interactions (e.g., Trujillo et al., 2022). Similarly, in primates, pose estimation has recently been used in several studies, to for instance quantify siamangs’ movement during multimodal signalling (Pouw et al., 2023), or for the automatic detection of stone-handling behaviours in Japanese macaques (Ardoin & Sueur, 2023). Importantly, Schofield et al. (2019) compared the time costs and accuracy of human annotators to their trained model for the identification of different chimpanzee individuals, and found that these tools are not only faster, but also more reliable than manual annotations. These advancements therefore not only allow for more effective workflow but also open new research questions as well as new opportunities for comparative work.
The overall aim of this collaboration is to apply computer vision tools to video recordings of chimpanzees to detect, characterise, and potentially classify their behaviour. We will use the computational methods used in the FLESH project and adapt them to animal behaviour to see whether we can complement manual coding with new information (e.g., velocity) that may enable a better understanding of chimpanzee communication dynamics. The first step will be to train a model and evaluate the procedure. Afterwards, Šárka Kadavá and Chiara Zulberti will apply the trained model to an extended dataset of video recordings, collected within CZ’s main ViCom project. Finally, both teams will meet in Leipzig to discuss the results and a potential extension of this collaboration to compare movement dynamics between non-human and human primates. Additionally, we plan to present the results of our modelling procedure at conferences and/or publish them as a short report in a journal.
