Compositional Structures in Chimpanzee Gestural Communication

Project Participants

Project Description

In linguistics, scholars have traditionally differentiated between two typical properties of human language: combinatoriality and compositionality. Although these terms are not consistently used in literature, combinatoriality generally refers to the ability to combine meaningless sounds into meaningful morphemes and words, whereas compositionality usually refers to the ability to further recombine already meaningful elements (i.e. morphemes and words) into new elements with novel meanings (Hockett 1960; de Boer et al. 2012). Therefore, combinatoriality and compositionality convey both productivity and flexibility to human languages, allowing them to become potentially open-ended despite being based on a limited number of initial components (Chomsky 1981; Fitch 2010; Jackendoff 2011; Werning et al. 2012). To date, combinatoriality and compositionality are generally considered universal properties of natural human languages that not only characterize spoken language, but also sign language and other forms of non-verbal communication, including emotion expression (e.g. Sandler & Lillo-Martin 2006; Cavicchio et al. 2018). 

For several decades, scholars have regarded compositionality as one of the hallmarks of human language, setting human communication apart from that of other animals (Hurford 2011). In the last years, however, experimental evidence has shown that several species do not only use complex repertoires to communicate with conspecifics, but that their communication systems also often show properties that had long been considered to be uniquely human. These studies, for instance, have provided abundant evidence of combinatoriality in the vocal communication systems of several non-human species (e.g. white-handed gibbons, Hylobates lar: Clarke et al. 2006; chestnut-crowned babblers, Pomatostomus ruficeps: Engesser et al. 2015; see Townsend et al. 2018, for a review). However, evidence of compositionality is unsurprisingly scanter. Important exceptions include Campbell’s monkeys (Cercopithecus campbelli), who can combine meaningful vocalizations into context-specific sequences (Ouattara et al. 2009a), and modify the meaning of their alarm calls by adding a suffix to the call (Ouattara et al. 2009b); and putty-nosed monkeys (Cercopithecus nictitans), who can combine specific alarm calls into new vocalizations that have a novel meaning (Arnold & Zuberbühler 2006, 2008). Among birds, evidence of compositionality has been found in pied babblers (Turdoides bicolor; Engesser et al. 2016), and in Japanese great tits (Parus minor), who react differently to single notes and to their combinations (Suzuki et al. 2016).

For non-vocal signals, like gestures and facial expressions, evidence of combinations with a novel meaning is scarce. Non-human primates (hereafter, primates), for instance, have large repertoires of gestures (Liebal et al. 2013; Hobaiter & Byrne 2011). However, the few studies that have assessed compositionality in primate gestural systems have found no evidence of it (e.g. western gorillas, Gorilla gorilla gorilla: Genty & Byrne 2010; Tanner & Perlman 2017; Sumatran orangutans, Pongo pygmaeus abelii: Tempelmann & Liebal 2012; chimpanzees, Pan troglodytes: Liebal et al. 2004; Hobaiter & Byrne 2011). In particular, although these studies have offered different conclusions about the emergence of gesture combinations, none of them has found evidence that gestures are combined into longer sequences to create novel meanings. Captive chimpanzees, for example, frequently produce sequences of gestures, but the majority of these sequences consists in simple repetitions of the same gestures, that likely serve to increase recipients’ responsiveness (Liebal et al. 2004). Wild chimpanzees also use long and largely redundant sequences of rapid-fire variable gestures, but with increasing age they shift to selecting single gestures (Hobaiter & Byrne 2011). Sumatran orang-utans, surprisingly, usually continue to gesture, regardless of whether the recipient responds or not (Tempelmann & Liebal, 2012). Moreover, across species, there seem to be no significant differences between the meaning of a gesture when produced as part of a sequence and when used alone, suggesting no compositionality in great ape gestural communication (Liebal et al. 2004). Across these studies, however, gesture “meaning” has been usually inferred from the context in which gestures and gesture sequences were used, rather than from the response they elicited from the recipient (see Hobaiter & Byrne 2014; Liebal & Oña 2018). Therefore, it is possible that a better operational definition of “meaning” might lead to different results. In contrast, there is virtually no research on the sequential combination of facial expressions, although such combinations are sometimes considered as “blended displays”, which share characteristics with other prototypical facial expressions (Parr et al. 2005). 

Primate communication is multimodal (Slocombe et al. 2011; Liebal et al. accepted), and compositionality might also occur through the recombination of different elements across modalities. To date, however, very few studies have investigated whether species other than humans compositionally recombine elements across different modalities (i.e. gestures, vocalizations and facial expressions) to create new meaningful elements. Moreover, the limited number of studies that used such a multimodal approach did not specifically target compositionality in these combinations. Both Hobaiter and colleagues (2017) and Wilke and colleagues (2017), for instance, focussed on combinations of gestures and vocalizations, and were interested in response patterns to these combinations as compared to their single components. Oña and colleagues (2019) also investigated multi-modal combinations in chimpanzees, but focused on the combination of gestures and facial expressions in a semi-wild setting. In this study, they found that different combinations elicited different responses, in that adding a specific facial expression to a gesture modified the likelihood of a specific response to occur (Oña et al. 2019). In particular, the combination of this facial expression with one gesture type increased the likelihood of affiliative behaviour, while it eliminated the bias toward an affiliative response when combined with a different gesture type. Thus, while this study did not conclude that the different facial and gestural components have specific meanings and that new meanings are created with their combination, it suggests that facial expressions may fulfil the important function of modifying following or co-occurring gestures. In other words, facial expressions might change the context in which a gesture is used and therefore impact the recipient’s response and the outcome of social interactions (Oña et al. 2019).

In line with these findings, we suggest considering the concept of meta-communication as a special aspect of compositionality, given that this term typically refers to those signals (“secondary communication”) that alter the meaning of other behavioural elements (“primary communication”; Bateson 1955; see Mitchell 1991, for a detailed discussion about the use of this term). Meta-communicative signals, for instance, are those that convey a playful meaning to behaviours that otherwise belong to the aggressive repertoire of a species, like hitting or wrestling, or that are generally used in other functional contexts (e.g. Bateson 1956; Fagen 1981; Bekoff & Allen 1998; Pellis & Pellis 2009). As for compositional combinations, therefore, signals and behavioural elements acquire a different meaning when they are combined, as compared to when they are independently produced. In primates, a variety of signals might serve a meta-communicatory function, including facial expressions, body gestures and vocalizations (Bekoff 1972, 1995; Yangi & Berman 2014). However, most evidence of a meta-communicatory function has been found for play face (or relaxed open mouth-face) – a facial expression that primates often use in the context of play (e.g. van Hoof 1972; Palagi 2008; Spinka et al. 2016). Play faces, in particular, appear to clarify the playful meaning of behaviours that may otherwise appear agonistic (e.g. Pellis & Pellis 1996; Bekoff & Allen 1998; Palagi 2008, 2009), likely preventing aggressive escalations among players (Waller & Dunbar 2005). Indeed, play faces more frequently occur during rough or contact play, like wrestling (e.g. Fedigan 1972; Chevalier-Skolnikoff 1974; Palagi 2007; Palagi & Paoli 2007; Demuru et al. 2015), and they are often associated to longer play bouts (e.g. Palagi 2007; Waller & Cherry 2012; Spinka et al. 2016) and a higher number of players (e.g. Palagi 2008). 

In literature, there is also no consensus on how temporally close two elements must be, in order to be considered part of the same combination. For most authors, for instance, meta-communicative signals do not need to be produced right before the behavioural elements whose meaning they should modify, but they may also be produced simultaneously or immediately afterwards (e.g. Fedigan 1972; Schwartzmann 1979; Pellis & Pellis 1996; Palagi 2008; Spinka et al. 2016; Beresin & Farley-Rambo 2018). Therefore, play faces may repeatedly occur during the play bout to continuously assert the playful intent of behavioural patterns that happen before, during or after the play face, maintaining and prolonging social play rather than initiating it (e.g. Fagen 1981; Pellis & Pellis 1996; Palagi 2007, 2008; Waller & Cherry 2012; Yanagi & Berman 2014; Wright et al. 2018). In vocal communication, in contrast, the elements of a combination cannot be produced simultaneously, but they must be in close temporal proximity in order to compositionally acquire a new meaning, and they must also be ordered in specific ways, as only certain sequences of elements trigger a novel response (e.g. Ouattara et al. 2009a; Suzuki et al. 2016).

In primate communication, the context might also play a crucial role, by for instance conveying a different meaning to the same signal. In chimpanzees, for instance, the bared teeth face modulated recipients’ response to different arm gestures, but only during affiliative events (Oña et al. 2019). Therefore, it is possible that individuals may give a different meaning to the same signals also depending on the context in which they are used. To date, however, context has been rarely used as a possible modifier of meaning in signals. Moreover, the study of compositionality is often limited to very few contexts. The meta-communicatory function of certain signals, for instance, has been typically investigated in the playful context, where individuals exchange unpredictable behavioural patterns that may easily escalate into overt aggression, so that clearly conveying the playful meaning through specific signals may be highly adaptive (e.g. Bateson 1955; Burghardt 2005; Pellis & Pellis 2009; Spinka et al. 2016). However, signals may be compositionally combined also in other contexts (e.g. aggressive or feeding), and they may alter the meaning of behavioural elements that would otherwise trigger different responses, well beyond the play context. Therefore, studies on compositionality should systematically account for the several contexts in which single elements and their combinations are produced.

Finally, there is currently no research on whether such combinations are planned, voluntarily produced means of communication, in which signallers intentionally combine elements to convey new meanings. Intentionality is a key feature of human language, and intentional production is an inherent part of the gesture definition in comparative communication research, although this aspect has been considered to a much lesser extent in facial and vocal research (Liebal et al. 2013). Therefore, gestures are often perceived as intentionally produced means of communication, which are flexibly adjusted to the recipient’s behaviour, while vocalizations and facial expressions are often presented as merely involuntary, spontaneous expressions of emotional states (e.g. Chevalier-Skolnikoff 1974; Tomasello 2008; Scopa & Palagi 2016; Beltran Frances et al. 2020). Importantly, although researchers have proposed different criteria to identify intentional communication, there is yet no agreement on which and how many of these criteria need to be met to consider a signal as being intentional, nor is there consistency on how these criteria are applied across modalities. These criteria include the social use of signals (i.e. signals are always produced in the presence of a recipient), sensitivity to recipients’ attentional state (in the case of visual signals), as well as persistence and/or elaboration (i.e. signals are repeated until the recipient produces a response, and might be even elaborated to elicit the recipient’s response). Up to date, however, these criteria have not been used to study compositionality, and to our knowledge intentionality is not usually considered a necessary prerequisite for compositionality, neither in human literature nor in literature on other species (e.g. Arnold & Zuberbühler 2006, 2008; Ouattara et al. 2009a, 2009b; Engesser et al. 2016; Suzuki et al. 2016). Therefore, we still do not know whether primates combine different signals across modalities to create new meanings, and whether these combinations are characterized by a set of markers indicating their intentional use.