CASL Ultrasound Tongue Imaging group

At QM we understand the value of high-quality data on speech production and articulation, applying data from a number of techniques to a wide range to theoretical problems. In 2002, Dr Nigel Hewlett, Laboratory Director, initiated a long-term programme of research and development into the of use of ultrasound for tongue imaging. With advice from Prof Maureen Stone, support from the Scottish Higher Education Funding Council through SRIF2 and SRIF3 funding (Science Research Investment Fund), and in collaboration with Dr Alan Wrench of Articulate Instruments Ltd, our laboratory was born.

Potentially, ultrasound makes for an ideal recording mechanism. It is safe, provides good resolution images in real time and is portable.

In practice, the technique, like any other, is both amazingly powerful and revealing, and frustrating and difficult.

Below we provide some basic information about our lab, selected output, and information for research subjects.

News and Outputs

Externally-funded Grants

  • Dr Natalia Zharkova, Dr Nigel Hewlett, Dr Robin Lickley and Prof Fiona Gibbon (University College Cork, Republic of Ireland) "Coarticulation and tongue differentiation in children between three and thirteen years old". ESRC grant (ES/K002597/1) of £303,764 (£379,234 fEC). 
  • Prof. James Scobbie, Dr Eleanor Lawson and Dr Jane Stuart -Smith "Seeing the Links in the Speaker-Hearer Chain: an investigation of the transmission of articulatory variation using Ultrasound tongue imaging”. ESRC grant of £160,000 (~£238,000 fEC).
  • Prof. Steve Renals (PI, University of Edinburgh), Prof. James M. Scobbie, Dr Joanne Cleland and Dr Korin Richmond. "Ultrax: Real-time tongue tracking for speech therapy using ultrasound". EPSRC grant of £586,154
  • Dr Natalia Zharkova, Dr Robin Lickley and Dr Nigel Hewlett "Lingual coarticulation in preadolescents and adults: an ultrasound study". ESRC grant ( RES-000-22-4075) of £80,000 (~£100,000 fEC). Oct 2010- Sep 2011.
  • Lilian Kuhn held a CAPES visiting scholarship (Brazil) 2010
  • Xaver Koch held an Erasmus visiting placement (Germany) 2010
  • Sonja Schaeffler, James M Scobbie and Ineke Mennen (Bangor) "Open-Mouthed or Stiff Upper Lip? Exploring Language-Specific Articulatory Settings in English-German Bilinguals." ESRC Research grant (RES-000-22-3032, £99,580 fEC). 01/09/2008-30/04/2010.
  • Janine Lilienthal (Germany) held a Marie Curie EdSST scheme research fellowship 2008-2009
  • Natalia Zharkova, Nigel Hewlett and William Hardcastle "An ultrasound study of lingual coarticulation in children and adults". ESRC Research Grant (RES-000-22-2833, £97,919.66 fEC). 01/05/2008-30/04/2009.
  • Sonja Schaeffler and James M ScobbieFine phonetic detail in the specification of grammar: exploring differences in articulatory settings across languages”. British Academy Research Grant RG-48460, May-Aug 2008. (£5,600).
  • Natalia Zharkova "Analysing Speech Variability with Ultrasound and EPG ". ESRC post-doctoral fellowship PTA-026-27-1268, Dec 2006-Nov 2007. (£74,108 fEC)

Publications (for an up-to-date publication record, see QMU eResearch)

Doctoral Theses

  • Kocjančič, Tanja (2010) Ultrasound and acoustic analysis of lingual movement in teenagers with Childhood Apraxia of Speech, control adults and typically developing children. PhD, QMU. 
  • Zharkova, Natalia (2007) An investigation of coarticulation resistance in speech production using ultrasound. PhD, QMU.

Conference presentations by year

Selected Conference presentations 2008

Selected Conference presentations 2007

  • Natalia Zharkova"Differences in coarticulation resistance between vowels and consonants: an ultrasound study" Perception and perception-production interaction. Part II* : Brain and Speech Grenoble, France, 17 Sept 2007
  • Tanja Kocjančič "The influence of syllable onset complexity on tongue movements: An ultrasound study" Perception and perception-production interaction. Part II* : Brain and Speech Grenoble, France, 17 Sept 2007
  • Alan Wrench, James M. Scobbie and Marietta van der Linden "Evaluation of a helmet to hold an ultrasound probe" Ultrafest IV, New York University, September 28-29, 2007
  • Alan Wrench "Articulate Assistant Advanced: Ultrasound Module" Ultrafest IV, New York University, September 28-29, 2007
  • Natalia Zharkova "Coarticulation resistance quantification in several Scottish English phonemes" Ultrafest IV, New York University, September 28-29, 2007
  • Tanja Kocjancic "Measuring tongue movements in complex onsets: an ultrasound study" Ultrafest IV, New York University, September 28-29, 2007
  • James M Scobbie, Jane Stuart-Smith, Eleanor Lawson " Ultrasound tongue imaging for sociophonetics" Ultrafest IV, New York University, September 28-29, 2007
  • Jane Stuart-Smith, James M. Scobbie & Eleanor Lawson "Articulatory insights into language variation and change: Preliminary findings from an ultrasound study of derhoticization in Scottish English" NWAV 36 (New Ways of Analyzing Variation) Pennsylvania, USA, October 11-14, 2007

Selected Conference papers 2006

  • James Scobbie and Jane Stuart-Smith “Where the [l] [r] they? An ultrasound tongue imaging study of Scottish English liquids”. British Association of Academic Phoneticians (BAAP). Edinburgh, 10-12 April 2006
  • Natalia Zharkova and Nigel Hewlett title "Studying coarticulation resistance with ultrasound" 5th International Conference on Speech Motor Control, Nijmegen, 7th-10th June 2006.
  • James M Scobbie, Koen Sebregts and Jane Stuart-Smith "From subtle to gross variation: an ultrasound tongue imaging study of Dutch and Scottish English /r/." Tenth Conference on Laboratory Phonology (LabPhon 10), Paris, 29 June - 1 July 2006
  • Alan A. Wrench and James M. Scobbie "Spatio-temporal inaccuracies of video-based ultrasound images of the tongue"7th International Seminar on Speech Production (ISSP), Ubatuba, Brazil, 13-15 December 2006.]

Selected Conference papers 2005

  • Fiona Gibbon and Maria Wolters “A New Application of Ultrasound to Image Tongue Behaviour in Cleft Palate Speech” (poster) Craniofacial Society of Great Britain and Ireland Annual Scientific Conference, Swansea, Wales, 13-15 April, 2005
  • James Scobbie and Koen Sebregts "Getting at variation with ultrasound: examples from Scottish and Dutch /r/" Third Ultrasound Workshop, Tucson Arizona, USA, 14-16 April, 2005
  • Natalia Zharkova "An ultrasound study of the trough effect in VhV sequences" Third Ultrasound Workshop, Tucson Arizona, USA, 14-16 April, 2005
  • Yolanda Vázquez Álvarez"The Trough Effect: Can we predict tongue lowering from acoustic data alone?" Third Ultrasound Workshop, Tucson Arizona, USA, 14-16 April, 2005
  • Yolanda Vázquez Álvarez and Nigel Hewlett “The trough effect: an ultrasound study” 3rd Conference on Experimental Phonetics. University of Santiago de Compostela, Spain, 24-26 October, 2005.

Selected Conference papers 2004

  • Yolanda Vázquez Álvarez, Nigel Hewlett and Natalia Zharkova “An ultrasound study of the trough effect" British Association of Academic Phoneticians Colloquium, University of Cambridge, Cambridge, UK. 24-26 March 2004
  • James M Scobbie "Labiodental /r/ from alveolar /r/: lip or lingual vs. lip & lingual" Ultrapolooza (2nd Ultrasound Round Table) The University of British Columbia, Vancouver, 22nd-23rd April 2004.
  • Yolanda Vázquez Álvarez, Nigel Hewlett and Natalia Zharkova "An ultrasound study of co-articulation & the "Trough Effect" in symmetrical VCV syllables: A report of work in progress" Ultrapolooza (2nd Ultrasound Round Table) The University of British Columbia, Vancouver, 22nd-23rd April 2004.
  • Alan A Wrench "Matching, merging and means: Spline productivity tools for ultrasound" Ultrapolooza (2nd Ultrasound Round Table) The University of British Columbia, Vancouver, 22nd-23rd April 2004.


Ultrasound Tongue Imaging research facilities at Craighall are based in our new laboratories. To make a booking or for other details of use, please contact Steve Cowen.

We use software and hardware from Articulate Instruments Ltd, a commercial company run by Dr Alan Wrench with whom we have a close relationship for research and development. The equipment and software is available for commercial purchase, and is fully supported and CE-marked systems for clinical use are available.

Key features:

  • 100 frames per second highspeed machine with synchronised audio
  • four ultrasound machines for teaching and video-output (60fps deinterlaced)
  • two headsets to support the probe
  • dedicated data capture and analysis within the same software package to ensure temporal synchronisation (multiple copies of "AAA")
  • comprehensive analysis tool including tongue-curve fitting, annotation curve averaging, and export
  • 20 terabytes of dedicated storage
  • multi-channel set up with support for simultaneous EPG, video and motion capture.

Articulate Instruments website can be consulted for up-to-date documentation. See also this 2008 poster for some methodological protocols.

We also support EMA research via our joint EPSRC-funded facility comprising two Carstens AG-500 3D machines with EPG, housed at the new Informatics Building, University of Edinburgh, led by Dr Alice Turk. The Edinburgh Speech Production Facility.


History - ultrasound development to 2007 at Corstorphine

Phase 1 (2002-2006): information about video-based Ultrasound Tongue Imaging, Electropalatography, and our stabilisation helmet.

Phase 2 (2006-2010): information about corpora, fieldwork: <coming soon>

Phase 3 (2010-): information about 100fps dedicated digital UTI, lip video, bite plane normalisation, fan averaging <coming soon>


Informal recording. Hand-held use of probe in any venue with no stabilisation is ideal for rapid qualitative screening of variables such as vocalisation of /l/, glottal replacement vs. glottal reinforcement of /t/, strong derhoticisation, covert /r/, tongue shape of /r/. Broad categories can be assigned on visual and/or an auditory basis. Small amounts of data can be efficiently gathered from large numbers of speakers. It is difficult or impossible to maintain probe alignment across more than a small number of tokens, though some participants are more skilled than others. The probe rides on the skin, whereas in our stabilised mode with headset, the probe is more fixed to the cranial space. Images should be recorded with a portable hard disk digitising equipment with an audio channel, so the speaker can be categorised live and also later [more information on this shortly]. The images are compressed, the audio synchronisation is only approximate, but for providing raw tongue shape data, it is invaluable, and the equipment is far more portable. A laptop-based UTI system would also be appropriate.

Ethical issues. Ethical approval has been granted to keep data gathered in a museum or science centre, and consent can be indicated by participants (or by parents) signing a line on a very simple release form [link to appear], even if a small amount of demographic information (age, sex, postcode) is gathered. We did not have access to hard disk recorders while scoping was underway, so in practice we did not record any of this data, nor actually analyse the speech of individuals. We will note here that ethical approval for the more formal data collection in laboratory or school was not problematic, but because the combination of video images of the tongue and audio speech data can be considered “biometric”, like a blood sample, processes of gaining ethical approval might therefore be far more stringent than linguists are used to. It is necessary to explain why such data is to be kept in perpetuity as a speech corpus, and to more explicitly gain fully-informed consent from subjects. Our consent forms and subject information sheets are available online.

Hardware and software. The headset from Articulate Instruments Ltd is stable (Scobbie et al 2008, [pdf]) but complex to use, even with experience, so about 10-15 minutes were required to check the fit. Bearing in mind how much speakers move their head when talking in a relaxed vernacular way, we highly recommend use of such a method rather than making speakers sit rigidly still, particularly when collecting discourse data. We are only able to record a single speaker using AAA hardware and software even in discourse conditions ((though it would be possible to record both using a digital video recorder). In addition to standard use of a prompt list, a facility to capture samples of speech continuously and automatically (we recommend samples of 15 seconds) enables researchers to leave the room during recordings, and a delay facility to capture accurate response times to the appearance of a prompt. If discourse/continuous speech is recorded, different channels of a silent flash-memory professional recorder can be used so that the entire discourse is available as an audio recording.

Speaker behaviour. Inter-speaker dynamics seem more important as a factor in the loquacity and relaxation of participants than the particular choice of recording equipment (Lawson et al 2008 [pdf soon]). To encourage vernacular speech, some practice may be necessary. Same-sociolect friendship pairs can be brought to the laboratory to mutually support their vernacular behaviour during experimental monologue speech, either through being present as a peer to monitor, or just through companionship before the experiment. If participants are left alone with no intervention, they may stand up, wander around, and fiddle with equipment, loosening screws etc. Some data loss may occur. If speakers know they are being listened to, live, their behaviour tends to be more guarded than if they know they can say what they like. It does not seem to matter so much that someone will listen, later. A live closed circuit video camera can be used in case the subjects want to signal for help due to equipment failure. It makes sense to actively monitoring image quality and acoustic levels in the control room as well as ossible speaker discomfort (via touching the head or helmet) without listening. If there is a problem, the experimenter can knock on the door, enter, and ask how the participants are feeling. Participants were explicitly shown that the camera was not recording them and also that we were not able to hear what they were saying. Informants seemed to cope well with the recording scenario, in fact positively relishing all the attention. They were courteous and enthusiastic. The video link we think leads to improved behaviour without undermining confidence. Even with video surveillance and a laboratory setting, in discourse, teenager speaker pairs are able to “fool around” mildly and act in a normal fashion.

Setting. Informal “quiet” settings in schools are unlikely to provide adequate quality acoustic data, especially if noisy recording equipment (PC, laptop, DAT recorder, ultrasound machine) are present. Technical problems are more likely to occur, and be more severe. More sensitive technical equipment is only appropriate for laboratory use, but even “portable” equipment is easier to use and gives higher quality phonetic results in a more controlled setting. At QMU, participants were left alone in one acoustically damped room, while all equipment was situated in a neighbouring room, drastically reducing noise.

Subject pairing. Friendship pairs can travel to a laboratory setting. Same sex groups of four may appear efficient, but for an articulatory study this is likely to be overly stressful for the research team.

Data loss. With UTI, some people simply cannot provide good images, and the subject pool should be enlarged, particularly for adult speakers, to provide for wastage. Child participants are more likely to provide good quality images, but on the other hand are also less likely to tolerate a head support (chair, or, as in our case, headset) for long periods. Data loss also occurs due to operator error. Something new is always likely to go wrong, and this must be borne in mind.

Duration of recording. Articulate Assistant Advanced captures speech and ultrasound (and EPG, and video, and other channels) direct to disk. This lets us gather data at a rate of approximately 100 citation form words in 10 minutes. We have found that the headset becomes uncomfortable after about 20 to 30 minutes.

Prompt materials. AAA software lets us collect wordlists, with multiple tokens in a randomised order, with text and/or images. These can be displayed on the screen after recording begins. Spontaneous discourse can also be recorded, with "dummy" prompt list just to identify a series of chunks. More than one repetition of each real word should be gathered.

Sampling rate. The approximately 30Hz NTSC video sampling rate does not give a sufficiently good temporal resolution to analyse fast moving or short duration events. However, if the internal scan rate of the probe is set to greater than 60Hz, and the video images deinterlaced, an effective sampling rate of approx 60Hz is achieved. Note that though NTSC is specified at 29.97Hz, each ultrasound machine may be rather different. Their outputs are not broadcast specified! Our ultrasound machine with a higher temporal resolution (100Hz) works by exporting raw scan data rather than the TV image. Such a machine is both less portable and less user-friendly due to the long lag in saving to disk after each token is captured, but high-speech synchronised data is possible. The images are visually very clean. A comparison between cineloop and video output found that the video is not far behind, at this stage, in spatial and temporal quality if enough care is taken during recording. (Wrench and Scobbie, 2008)


Current members (in alphabetical order)

Dr Joanne Cleland

Steve Cowen

Prof William J Hardcastle

Dr Nigel Hewlett

Dr Eleanor Lawson

Dr Robin Lickley

Dr Sonja Schaeffler

Prof James M Scobbie

Dr Sara Wood

Dr Alan Wrench (and also see

Dr Natalia Zharkova


Past members (in alphabetical order)

Dr Tony Buhr

Dr Tanja Kocjančič

Xaver Koch

Lilian Kuhn

Janine Lilienthal


Methodological issues

  • AAA software for data capture and analysis
  • Testing the effects of ultrasound equipment on speaker behaviour
  • A head-set to hold the probe steady
  • Simultaneous ultrasound and electropalatography
  • High-speech ultrasound
  • Simultaneous ultrasound and VICON motion capture

Linguistic and phonetic research

Information for Subjects

There are a number of different research projects using Ultrasound Tongue Imaging at QMU, so please select the project that interests you, below.

School Children (age 12-14) tongue measurement
(data collection complete at present) (Dr Jim Scobbie, Dr Eleanor Lawson)

German-English bilinguals (Dr Sonja Schaeffler, Dr Jim Scobbie)


Ultrasound is a wonderful thing - you can see your tongue moving in real time inside your mouth while you talk, sing, whistle, blow your trumpet or beatbox with Maad Skillz!

For research, it is potentially a very valuable tool. If you are visiting this site as a potential research subject, please see subject information above for general information or from there to a particular project page.