Virtual Reality Sustained Multimodal Distributional Semantics for Gestures in Dialogue (GeMDiS)

Project Participants

Project Description

Both corpus-based linguistics and contemporary computational linguistics rely on the use of often large, linguistic resources. The expansion of the linguistic subject area to include visual means of communication such as gesticulation has not yet been backed up with corresponding corpora. This means that “multimodal linguistics” and dialogue theory cannot participate in established distributional methods of corpus linguistics and computational semantics. The main reason for this is the difficulty of collecting multimodal data in an appropriate way and at an appropriate scale. Using the latest VR-based recording methods, the GeMDiS project aims to close this data gap and to investigate visual communication by means of machine-based methods and innovative use of neuronal and active learning for small data using the systematic reference dimensions of associativity and contiguity of the features of visual and non-visual communicative signs. GeMDiS is characterised above all by the following characteristics:

  • Ecological validity: the data collection takes place in dialogue situations and thus also takes a look at everyday gestures or interactive gestures in particular. In this respect, GeMDS differs from collections of partly emblematic hand shapes or gestural charades.
  • True multimodality: the VR-based recording technology records not only hand-and- arm movements and handshapes but also facial expressions — it is this proper multimodality that is the hallmark of natural language interaction. In this, GeMDS already anticipates potential further developments of ViCom.

The corpus created in this way is made available to the research community (FAIR principles). The results of GeMDS feed into social human-machine interaction, contribute to research on gesture families, and provide a basis for exploratory corpus analysis and further annotation. Furthermore, the project investigates to what extent the results obtained can serve formal semantics for the input problem of meaning representation (in short: in order to compute a multimodal meaning compositionally, it is first of all necessary to associate the linguistic and the non-vocal parts of an utterance with meanings, something that so far only happens intuitively). In the last phase of the project, a VR avatar will be developed into a playback medium of the previously recorded multimodal behaviour. This serves as a visual evaluation of the methodology. The avatar can also be used as an experimental platform, e.g. in cooperation with other projects.