Research Project - Ultra

CASL Ultrasound Tongue Imaging group

At QM we understand the value of high-quality data on speech production and articulation, applying data from a number of techniques to a wide range to theoretical problems. In 2002, Dr Nigel Hewlett, Laboratory Director, initiated a long-term programme of research and development into the of use of ultrasound for tongue imaging. With advice from Prof Maureen Stone, support from the Scottish Higher Education Funding Council through SRIF2 and SRIF3 funding (Science Research Investment Fund), and in collaboration with Dr Alan Wrench of Articulate Instruments Ltd, our laboratory was born.

Potentially, ultrasound makes for an ideal recording mechanism. It is safe, provides good resolution images in real time and is portable.

In practice, the technique, like any other, is both amazingly powerful and revealing, and frustrating and difficult.

Below we provide some basic information about our lab, selected output, and information for research subjects.


Ultrasound Tongue Imaging research facilities at Craighall are based in our new laboratories. To make a booking or for other details of use, please contact Steve Cowen.

We use software and hardware from Articulate Instruments Ltd, a commercial company run by Dr Alan Wrench with whom we have a close relationship for research and development. The equipment and software is available for commercial purchase, and is fully supported and CE-marked systems for clinical use are available.

Key features:

  • 100 frames per second highspeed machine with synchronised audio
  • four ultrasound machines for teaching and video-output (60fps deinterlaced)
  • two headsets to support the probe
  • dedicated data capture and analysis within the same software package to ensure temporal synchronisation (multiple copies of "AAA")
  • comprehensive analysis tool including tongue-curve fitting, annotation curve averaging, and export
  • 20 terabytes of dedicated storage
  • multi-channel set up with support for simultaneous EPG, video and motion capture.

We also support EMA research via our joint EPSRC-funded facility comprising two Carstens AG-500 3D machines with EPG, housed at the new Informatics Building, University of Edinburgh, led by Dr Alice Turk. The Edinburgh Speech Production Facility.


History - ultrasound development to 2007 at Corstorphine

Phase 1 (2002-2006): information about video-based Ultrasound Tongue Imaging, Electropalatography, and our stabilisation helmet.

Phase 2 (2006-2010): information about corpora, fieldwork: <coming soon>

Phase 3 (2010-): information about 100fps dedicated digital UTI, lip video, bite plane normalisation, fan averaging <coming soon>


Informal recording. Hand-held use of probe in any venue with no stabilisation is ideal for rapid qualitative screening of variables such as vocalisation of /l/, glottal replacement vs. glottal reinforcement of /t/, strong derhoticisation, covert /r/, tongue shape of /r/. Broad categories can be assigned on visual and/or an auditory basis. Small amounts of data can be efficiently gathered from large numbers of speakers. It is difficult or impossible to maintain probe alignment across more than a small number of tokens, though some participants are more skilled than others. The probe rides on the skin, whereas in our stabilised mode with headset, the probe is more fixed to the cranial space. Images should be recorded with a portable hard disk digitising equipment with an audio channel, so the speaker can be categorised live and also later [more information on this shortly]. The images are compressed, the audio synchronisation is only approximate, but for providing raw tongue shape data, it is invaluable, and the equipment is far more portable. A laptop-based UTI system would also be appropriate.

Ethical issues. Ethical approval has been granted to keep data gathered in a museum or science centre, and consent can be indicated by participants (or by parents) signing a line on a very simple release form [link to appear], even if a small amount of demographic information (age, sex, postcode) is gathered. We did not have access to hard disk recorders while scoping was underway, so in practice we did not record any of this data, nor actually analyse the speech of individuals. We will note here that ethical approval for the more formal data collection in laboratory or school was not problematic, but because the combination of video images of the tongue and audio speech data can be considered “biometric”, like a blood sample, processes of gaining ethical approval might therefore be far more stringent than linguists are used to. It is necessary to explain why such data is to be kept in perpetuity as a speech corpus, and to more explicitly gain fully-informed consent from subjects. Our consent forms and subject information sheets are available online.

Hardware and software. The headset from Articulate Instruments Ltd is stable (Scobbie et al 2008) but complex to use, even with experience, so about 10-15 minutes were required to check the fit. Bearing in mind how much speakers move their head when talking in a relaxed vernacular way, we highly recommend use of such a method rather than making speakers sit rigidly still, particularly when collecting discourse data. We are only able to record a single speaker using AAA hardware and software even in discourse conditions ((though it would be possible to record both using a digital video recorder). In addition to standard use of a prompt list, a facility to capture samples of speech continuously and automatically (we recommend samples of 15 seconds) enables researchers to leave the room during recordings, and a delay facility to capture accurate response times to the appearance of a prompt. If discourse/continuous speech is recorded, different channels of a silent flash-memory professional recorder can be used so that the entire discourse is available as an audio recording.

Speaker behaviour. Inter-speaker dynamics seem more important as a factor in the loquacity and relaxation of participants than the particular choice of recording equipment (Lawson et al 2008 [pdf soon]). To encourage vernacular speech, some practice may be necessary. Same-sociolect friendship pairs can be brought to the laboratory to mutually support their vernacular behaviour during experimental monologue speech, either through being present as a peer to monitor, or just through companionship before the experiment. If participants are left alone with no intervention, they may stand up, wander around, and fiddle with equipment, loosening screws etc. Some data loss may occur. If speakers know they are being listened to, live, their behaviour tends to be more guarded than if they know they can say what they like. It does not seem to matter so much that someone will listen, later. A live closed circuit video camera can be used in case the subjects want to signal for help due to equipment failure. It makes sense to actively monitoring image quality and acoustic levels in the control room as well as ossible speaker discomfort (via touching the head or helmet) without listening. If there is a problem, the experimenter can knock on the door, enter, and ask how the participants are feeling. Participants were explicitly shown that the camera was not recording them and also that we were not able to hear what they were saying. Informants seemed to cope well with the recording scenario, in fact positively relishing all the attention. They were courteous and enthusiastic. The video link we think leads to improved behaviour without undermining confidence. Even with video surveillance and a laboratory setting, in discourse, teenager speaker pairs are able to “fool around” mildly and act in a normal fashion.

Setting. Informal “quiet” settings in schools are unlikely to provide adequate quality acoustic data, especially if noisy recording equipment (PC, laptop, DAT recorder, ultrasound machine) are present. Technical problems are more likely to occur, and be more severe. More sensitive technical equipment is only appropriate for laboratory use, but even “portable” equipment is easier to use and gives higher quality phonetic results in a more controlled setting. At QMU, participants were left alone in one acoustically damped room, while all equipment was situated in a neighbouring room, drastically reducing noise.

Subject pairing. Friendship pairs can travel to a laboratory setting. Same sex groups of four may appear efficient, but for an articulatory study this is likely to be overly stressful for the research team.

Data loss. With UTI, some people simply cannot provide good images, and the subject pool should be enlarged, particularly for adult speakers, to provide for wastage. Child participants are more likely to provide good quality images, but on the other hand are also less likely to tolerate a head support (chair, or, as in our case, headset) for long periods. Data loss also occurs due to operator error. Something new is always likely to go wrong, and this must be borne in mind.

Duration of recording. Articulate Assistant Advanced captures speech and ultrasound (and EPG, and video, and other channels) direct to disk. This lets us gather data at a rate of approximately 100 citation form words in 10 minutes. We have found that the headset becomes uncomfortable after about 20 to 30 minutes.

Prompt materials. AAA software lets us collect wordlists, with multiple tokens in a randomised order, with text and/or images. These can be displayed on the screen after recording begins. Spontaneous discourse can also be recorded, with "dummy" prompt list just to identify a series of chunks. More than one repetition of each real word should be gathered.

Sampling rate. The approximately 30Hz NTSC video sampling rate does not give a sufficiently good temporal resolution to analyse fast moving or short duration events. However, if the internal scan rate of the probe is set to greater than 60Hz, and the video images deinterlaced, an effective sampling rate of approx 60Hz is achieved. Note that though NTSC is specified at 29.97Hz, each ultrasound machine may be rather different. Their outputs are not broadcast specified! Our ultrasound machine with a higher temporal resolution (100Hz) works by exporting raw scan data rather than the TV image. Such a machine is both less portable and less user-friendly due to the long lag in saving to disk after each token is captured, but high-speech synchronised data is possible. The images are visually very clean. A comparison between cineloop and video output found that the video is not far behind, at this stage, in spatial and temporal quality if enough care is taken during recording. (Wrench and Scobbie, 2008)


Current members (in alphabetical order)

Dr Joanne Cleland

Steve Cowen

Prof William J Hardcastle

Dr Nigel Hewlett

Dr Eleanor Lawson

Dr Robin Lickley

Dr Sonja Schaeffler

Prof James M Scobbie

Dr Sara Wood

Dr Alan Wrench (and also see

Dr Natalia Zharkova


Past members (in alphabetical order)

Dr Tony Buhr

Dr Tanja Kocjančič

Xaver Koch

Lilian Kuhn

Janine Lilienthal


Methodological issues

  • AAA software for data capture and analysis
  • Testing the effects of ultrasound equipment on speaker behaviour
  • A head-set to hold the probe steady
  • Simultaneous ultrasound and electropalatography
  • High-speech ultrasound
  • Simultaneous ultrasound and VICON motion capture

Linguistic and phonetic research

Information for Subjects

There are a number of different research projects using Ultrasound Tongue Imaging at QMU, so please select the project that interests you, below.

School Children (age 12-14) tongue measurement
(data collection complete at present) (Dr Jim Scobbie, Dr Eleanor Lawson)

German-English bilinguals (Dr Sonja Schaeffler, Dr Jim Scobbie)


Ultrasound is a wonderful thing - you can see your tongue moving in real time inside your mouth while you talk, sing, whistle, blow your trumpet or beatbox with Maad Skillz!

For research, it is potentially a very valuable tool. If you are visiting this site as a potential research subject, please see subject information above for general information or from there to a particular project page.