Ultrasound of tongue position and movements to analyse speech and provide real-time visual feedback in the speech therapy clinic. 

A team of speech scientists and speech and language therapists have analysed the articulation of speech and applied their findings to improve the speech communication of children with persistent speech disorders. The research team have pioneered the use of ultrasound technology to view in real time the movements of the tongue inside the mouth during speech. This has allowed children to see where their tongue is positioned during speech and to master the production of key speech sounds. The research intervention has ultimately transformed their speech communication and improved their quality of life.  

Some children who are unable to articulate the differences between consonants (for example, when they are trying to produce sounds like T and K, or SH and S) do not respond to conventional speech and language therapy. This occurs as these children are unable to make use of the feedback they receive about how their errors sound to others. They may also be unable to learn new and unfamiliar articulations. There are therefore, unfortunately, a number of older children and young people with well entrenched speech errors, who, despite many years of clinical intervention, have made little or no progress with correcting their speech sound errors. 

When children don’t understand how to make improvements to (correct) their speech sounds, the conventional therapy that they are using can become uninteresting and demoralising. In time, this therapy can become unproductive. Our research has shown that such children can benefit from seeing visually what the tongue is actually doing inside the mouth during speech, rather than solely relying on what they hear. The visual feedback helps them to learn how to repeat sounds that their therapist makes. Specifically, we have found that the child can benefit from additional feedback that relates more directly to their own articulations. If a child and their therapist can both, together, see dynamic images on a screen that are directly created from the child’s own articulators, in real-time, and accompanied by accurate feedback on how these new articulations sound, then they can overcome these persistent speech disorders. In simple terms, the child and their therapist use both visual imagery (in real time) to work out what is going wrong inside the mouth, and the child can see from their therapist where the tongue should be placed to correct the same sound, as well as using feedback on how each attempt sound. Together, they are now able to make breakthroughs to the way sounds and words are pronounced. This is achieved through the use of advanced ultrasound technology and clinical protocols developed at Queen Margaret University. 

What our research has done: 
In collaboration with a commercial software and hardware engineering company (Articulate Instruments Ltd, led by Professor Alan Wrench and based at Queen Margaret University) our research has used stabilized ultrasound scanners to create high resolution movies and still images of the shape, location, and movement of the tongue inside the mouth, during speech, By ensuring that these images are accurately synchronized with the speech sounds that are produced, we can explore previously inaccessible details of speech production. These systems can easily record speech samples for immediate playback in clinic or in teaching as well as for detailed analysis. We have compared clinical samples against typical speech, and developed measures that let us track improvements through a course of treatment, objectively. This approach helps the speech therapist detect inaudible aspects in correct-sounding as well as incorrect-sounding articulations, providing a more accurate basis for useful feedback on children’s progress towards improvement. 

In clinical collaborations led by Dr Joanne Cleland (formerly Queen Margaret University and now of Strathclyde University), we created clinical protocols for a wide range of speech sounds disorders in which the placement and shape of the tongue is critical to the production of correct-sounding speech. Specially-designed wordlists, clinical tasks, probes and processes build on previous understanding of the importance of articulatory practice, feedback and assessment in the speech clinic.

(1) During the Ultraphonix project, two thirds of the children who were suitable for ultrasound-based therapy to treat apparently intractable speech disorders (errors that had been persistent and unresolved) improved by a “clinically significant” degree. This was measured in terms of a reduction in perceptible errors in the consonants (or vowels) treated, in new words that had not been practiced in treatment. Further, the parents rated their children’s speech as improving from being “sometimes understood” to “usually understood”. 

(2) Our treatment model was generally successful, across a wider range of disorders than any other articulatory intervention research and required only 1 hour therapy sessions in a 10-12 week block. 

(3) Our research into typical variation in speech production has been crucial in providing theoretical understanding of how speech varies between different languages, and of the social and geographical variation in the accents of a single language. The methods we have developed contribute to research to address questions like “what is an accent”, “why are some unfamiliar speech sounds hard to learn” and “how do people alter the way they speak” in any of the languages of the world. 

(4) A wide range of other disciplines also need to understand how the movements of the vocal tract generate speech, from surgeons to speech technologists, voice coaches, language teachers and animators. We provided the initial content for UltraSuite - an open curated repository for machine learning engineers. Our “Open Science” contribution comprised over 13 hours of speech from 76 children, of which the clinical speech component (from before and during treatment) is still the largest such articulatory dataset worldwide. 

(5) Seeing Speech - is a free-to-use website used for phonetic courses, self-learning, and continuing professional development for speech and language therapists. Clinicians use the audiovisual examples on the site to develop the child’s and parents’ knowledge of how particular sounds are made, enhancing therapeutic intervention and providing motivation. Indeed, 10% of over 250,000 users worldwide are speech and language therapists (along with students, the public, second language learning community). One of the skilled model talkers used was Professor Janet Beck of Queen Margaret University, and the site’s animations are based exclusively on her speech production. Dr Eleanor Lawson is leading a new project at QMU (2021-2023) which will further extend the site with examples of child speech and models of disordered speech sounds. 

Academics involved in leading/contributing to this research:

  • Professor James M Scobbie, Director of the Clinical Audiology, Speech and Language Research Centre (CASL) at Queen Margaret University 
  • Professor Alan A Wrench, CEO of Articulate Instruments Ltd based at Queen Margaret University 
  • Dr Joanne Cleland, Research Fellow and Speech and Language Therapist at Queen Margaret University and now Reader at the University of Strathclyde 



  • Professor Jane Stuart-Smith, Dr Eleanor Lawson, Professor Janet Beck, Dr Zoe Roxburgh, Dr Nigel Hewlett, Dr Natalia Zharkova, Professor Steve Renals, Dr Carolyn Hawkes. 
  • Funding for engineering and clinical research came from the Engineering and Physical Sciences Research Council (ULTRAX EP/I027696/1), the Chief Scientist Office (Ultraphonix ETM/402) and with ongoing support from Queen Margaret University spin-out company Articulate Instruments Ltd. Seeing Speech was funded by the Carnegie Trust for the Universities of Scotland and the Arts and Humanities Research Council. Research into typical speech variation and development was funded by the Economic and Social Research Council.