Capturing language and communication as it is: A new framework for multimodal data collection

An interdisciplinary team of researchers—including linguists, psycholinguists, psychologists, developmental neuro- and cognitive scientists, primate communication researchers, phoneticians, sign language and gesture specialists, experts in sign language technologies, computational linguists—funded by the German Research Foundation’s Priority Program on “Visual Communication” (ViCom), proposes a decision-oriented workflow to improve data collection, transparency, and reproducibility in the study of language and communication.

Beyond words and signs

Language and communication are more than just words or signs. When people use language to communicate, they simultaneously also utilize hand gestures, facial expressions, eye gaze, and various head and body movements. Until relatively recently, large parts of the language sciences disregarded this complexity and focused primarily on the study of the spoken or written word. One reason for this was also that scientific methods often struggled to adequately capture this impressive complexity.

A new study published in Advances in Methods and Practices in Psychological Science introduces a comprehensive framework to guide researchers across the language sciences in collecting and analyzing the multimodal data necessary for capturing the richness of real-world language use. The work, led by Anastasia Bauer (Key Profile Area Skills and Structures in Language and Cognition, University of Cologne), Patrick C. Trettenbrein (University of Göttingen and Max Planck Institute for Human Cognitive & Brain Sciences, Leipzig), Susanne Fuchs (Leibniz Centre for General Linguistics, Berlin) , and Martin Schulte-Rüther (University Hospital Heidelberg and University Medical Center Göttingen) provides a structured approach to one of the most pressing challenges in cognitive science of language and communication and the language sciences in general: How to systematically collect and study communication that unfolds across multiple channels at once.

A fundamental shift in the language and communication-related sciences

Over recent decades, research across linguistics, psychology, and neuroscience has increasingly recognized that both human communication and real-world language use are inherently multimodal. Signals in different modalities—speech, manual and nonmanual gesture (facial expressions) and more—are tightly integrated and jointly convey meaning. “Real-world language use does not happen in isolated channels,” says Patrick C. Trettenbrein, co-lead author of the paper. He continues to explain that, “If we focus on only one modality, we risk missing essential aspects of how meaning is actually construed during communicative language use.”

These signals include not only vocal-acoustic cues, but also somatosensory and articulatory processes (e.g., lip closure in stop consonants or tongue–palate contact), as well as non-vocal signals such as movements of the hands, head, torso, and face. The coordinated use of such signals across modalities is referred to as multimodality.

At the same time, capturing this richness comes at a cost. Multimodal datasets are complex, expensive, and methodologically challenging. For example, multimodal data collection may happen in drastically different research settings, ranging from classical and typically well-controlled laboratory experiments to the creation and investigation of large-scale multimodal corpora, as well as naturalistic data collection in the context of field studies. Every research setting involves distinct trade-offs between precision, realism, and scalability that researchers have to take into account when planning their investigation.

Making complexity manageable in the lab and in real-world interactions

To address these challenges, the interdisciplinary team of researchers assembled by Bauer Trettenbrein, Fuchs, and Schulte-Rüther propose a flexible decision framework that structures multimodal research into three key stages: (1) Defining the research question, population, and design; (2) implementing the study, including technical and ethical considerations; and (3) planning data sharing and reuse. Rather than prescribing a single “best” method, the framework is intended to help researchers make explicit, transparent decisions tailored to their specific research goals. “Multimodal research always involves trade-offs,” explains Anastasia Bauer, lead author of the paper who coordinated the interdisciplinary team of authors. She further elaborates, “Our goal was not to eliminate these trade-offs, but to make them visible and actionable—so that researchers can choose methods that are fit for the specific purpose of their multimodal data collection and research project.”

In the paper, the novel framework proposed by the researchers, who are all either directly funded by or involved with the German Research Foundation’s priority program on “Visual Communication” (ViCom), is illustrated by contrasting three exemplary case studies describing radically different research settings: A controlled laboratory experiment, the creation of a large-scale sign language corpus, and naturalistic observations of communication by non-human primates directly in the field. Together, these examples demonstrate that no single approach can fully capture the richness of multimodal behavior. Highly controlled experiments allow precise measurement but may lack ecological validity, while real-world observations provide naturalistic insight but limit experimental control. “These trade-offs are not necessarily methodological weaknesses,” says Susanne Fuchs (Leibniz Centre for General Linguistics, Berlin), co-senior author of the paper. “Instead, they are a direct consequence of studying language and communication as complex multimodal behaviors.”

Beyond methodological guidance, the authors’ novel framework also addresses broader issues of research transparency, ethics, and data sharing in the context of multimodality. By encouraging systematic documentation of decisions the flexible decision framework advocated for by the authors aims to improve reproducibility and enable meaningful reuse of multimodal datasets already from the earliest stages of study planning through data collection, data management, and data sharing. “Multimodal datasets are incredibly valuable, but often include sensitive personal data and are difficult to replicate,” explains Martin Schulte-Rüther (University Hospital Heidelberg and University Medical Centre Göttingen), senior author of the paper. “By documenting how they are collected, processed, and protected we can make them more interpretable and more useful for the wider scientific community.”

The multimodal approach is a methodological necessity

In sum, the research team from the German Research Foundation’s priority program on “Visual Communication” (ViCom) argues that multimodal data collection is not simply a technical extension of existing methods, but a necessary step toward understanding how language and communication work in real-world settings. As research increasingly moves beyond isolated signals toward integrated systems of behavior, frameworks like this may play a crucial role in shaping the future of the cognitive science of language and language sciences in general.

Original publication:

Bauer, A., Trettenbrein, P. C., Amici, F., Ćwiek, A., Krause, L.-M., Kuder, A., Ladewig, S., Schulder, M., Schumacher, P., Spruijt, D., Zulberti, C., Fuchs, S., & Schulte-Rüther, M. (2026). Data Collection in Multimodal Language and Communication Research: A Flexible Decision Framework. Advances in Methods and Practices in Psychological Science, 9(2). https://doi.org/10.1177/25152459261442338

This text in sign language: