PressCNRS international magazine

Table of contents


Lip reading at its best

In Grenoble, in the heart of the Alps, researchers from a variety of disciplines are exploring human speech—a phenomenon at the interface of intimate and social domains whose complexity is often forgotten. An excursion into the scientific world of language.

Researchers at the Spoken Communication Institute (ICP)1 in Grenoble are passionate when they talk about their research: they have been working on the enigma of human speech for over twenty years. Their laboratory looks at speech from every possible angle, from the study of the vocal tract to analysis of emotions, from phonetics to the study of cerebral processes. Not to mention the different applications developed by the ICP such as robotics and telecommunications, as well as treatment of speech disorders. Are they juggling too many topics at the same time? Not in the least, according to physicist Jean-Luc Schwartz, who is the director of ICP. “Looking into all these areas is the only way to achieve our dream, that is, to someday understand how speech 'works'.”

Lip reading at its best01

© C. Savariaux/CNRS Photothèque

The apparatus on the left is able
to measure all speech-related articulatory events with precision. It is called an articulograph.

 The first obstacle in this seemingly simple quest was to uncover how air moves through our body to produce a sound and then a word. To understand this process, a team from the ICP in collaboration with the University of Eindhoven (Netherlands), created an artificial vocal tract going from the lungs (a 1 m3 box which distils compressed air) to the vocal cords (two small pieces of metal). “This prototype, which is the only one like it in the world, may look rather simple at first, but in fact it allows us to reproduce a mechanical speech gesture—based on the human model—which can be modified and reproduced forever,” announces Xavier Pelorson, who directs the Acoustics team at the ICP. “Thanks to this machine, we have demonstrated the still largely unexplained phenomena of swirls and turbulence at the larynx exit.” The researchers have also managed to reproduce the phenomenon of oscillation in the vocal cords which makes up the sound of a voice.

 Another major achievement has been the measurement of vibrations of the cords using a laser. The medical field has already started to use this knowledge in the treatment of voice disorders. Somewhat more unexpected is the connection with sleep apnea. “In this disorder, breathing stops due to partial obstruction of the upper airways,” explains Annemie Van Hirtum, a researcher with the Acoustics team. With the ICP device it is possible to study the disorder in detail. Collaboration between the TIMC laboratory and Grenoble's teaching hospital will hopefully lead to the development of software to help surgeons operate on such cases–the current success rate is only 50%.2

 But medicine is only one of the many domains concerned. “Traditional methods of voice synthesis give excellent results for male spoken voices, but not for those of women or children, nor for singing,” observes Xavier Pelorson. “In such a case, physical modeling is probably the best option.” Of course, there is a long way to go before we hear a song leave the “mouth” of the prototype. “Specifically, we need to analyze the contact mechanisms of the vocal cords,” advances Nicolas Ruty, a PhD student at the ICP.  “They are responsible for generating the high frequencies which are so characteristic of the human voice,” he adds.
 Speech and music are obviously closely linked. And Helene Loevenbruck, a linguist at the ICP, knows all about the melody that shapes our utterances. This is known as prosody, a term covering phenomena such as intonation, stress, rhythm, and phrasing. “One sentence can have four or five different meanings. It's the prosody which allows us to generally determine the right one,” she comments. Each language has a specific prosody, learned from birth. This is used to prioritize information and, more surprisingly, to decide whose turn it is to speak. “Margaret Thatcher, for example, always used falling intonation at the end of her sentences to make them more substantial,” says Loevenbruck, who does a wonderful imitation of the famous Iron Lady. “But she was always being interrupted because her interlocutors thought she'd finished speaking at the end of every sentence!” On an everyday basis, Loevenbruck works on focus, in other words, how an element in a statement is emphasized. Intonation, articulation, and the level of the voice are all used to make others understand. “One of our studies showed that people perceive focus as key information and understand it extremely well.” Which perhaps accounts for why this skill is developed early in life. “A baby points his finger at an object and says 'more',” Loevenbruck explains. “Gradually, the finger is replaced by the vocal apparatus, but both types of 'pointing' are similar.” This was further demonstrated when the ICP researchers recently proved that the brain areas activated were the same for each case. In adults, the process also involves a number of facial movements. At the ICP, all these “mini gestures” are analyzed using an electromagnetic “articulograph.”

 A few doors down, a team of researchers  work on “speaking machines.”  “By 'speaking' we mean two things,” says Gérard Bailly, who leads the team. First of all sound, obviously, but also everything you can observe in the faces of the two speakers. A dominating factor in a face-to-face discussion is mutual attention. “When I speak to you, for example, I watch your eyes to see whether you are following what I say, if you understand me and to see the type of effect my words have on you,” the researcher explains. “But when I listen to you, my glance moves between your eyes and your lips.” No, not because your interlocutors want to warn you about a piece of lettuce wedged between your teeth, they are just trying to confirm what they have heard. “We all know how to lip-read,” Schwartz points out. “And we use this skill in every conversation.”

 Scientists here are now trying to create virtual characters capable of this type of exchange, and their demonstration to visitors is impressive. On the screen, a woman's face answers your questions, all the while keeping her eyes fixed on you. The secret lies in laser sensors positioned all around the screen which can locate the back of your eyes. To create this virtual agent, the scientists covered the face of a real woman with almost 400 microspheres and then measured the movements for all possible combinations of two phonemes, in other words nearly a thousand. The lips are tracked precisely: the ICP has developed a highly effective method to study and model lip movements (see pictures). And that's not all. The ICP is currently working on creating a character for the Arte TV channel.3 Its role will be to replace the current teletext system for the deaf and hard of hearing. The virtual presenter will not use sign language but a form of “cued speech,” where “hand gestures act as cues to complement the movements of the lips,” Schwartz explains. To model these gestures, the same method was used: our modern Pygmalions covered a woman's face and hands with microspheres and got her to read around 230 sentences which covered virtually all sounds, using cued speech. Currently they are trying to equip their Galatea with a virtual tongue that you would almost swear you can see speaking. In 2002 the researchers developed one of the most powerful functional tongue models in the world. But they have an even more ambitious goal. “Our aim is to have two people communicate at a distance using virtual clones which echo their discussion,” Bailly says with a smile. To do that, the ICP is preparing a very special experiment room where human subjects will be locked in with a virtual character on a screen. Behind the observation windows, researchers will be able to draw valuable conclusions to improve their computer clones. And no doubt they will share the results with their 60-odd ICP colleagues over a drink between experiments. The coffee room at the entrance to the laboratory is constantly buzzing with the sound of lively discussions. One thing is sure: in this bustling research center, speech will be around for a good many years to come.

Matthieu Ravaud

When the brain speaks

For linguistics, progress in brain imaging has been a real godsend. Observing areas of the brain has made it possible to understand many of the mechanisms related to speech. Jean-Luc Schwartz gives us an example. “If I say to you: 'life, life, life, life, life, life,' it's highly likely that, at some point, you'll hear 'fly, fly, fly'.” ICP researchers recently highlighted the cerebral networks involved in this shift in meaning, a phenomenon well known to specialists, who have named it “the verbal transformation effect.”1 Their discovery is part of a long-term ongoing project. Indeed, a considerable amount of research still remains to be done on the link between the systems of production and perception of speech.


1. Sato M. et al., Neuroimage 23: 1143-51. 2004.

Notes :

1. Institut de la communication parlée. Joint unit: CNRS / Institut National Polytechnique de Grenoble (INPG) / Stendhal University of Grenoble.
2. Techniques de l'imagerie, de la modélisation et de la cognition
(TIMC—Imaging, modeling, and cognition techniques). Joint lab: CNRS / Université Grenoble-I.
3. In collaboration with the Thales group.

Contacts :

ICP, Grenoble
Gérard Bailly
Hélène Loevenbruck
Xavier Pelorson
Jean-Luc Schwartz


Back to homepageContactcredits