Speech intelligibility relates to the sound intensity modulations in the talker’s voice, as described by Houtgast et al. . This principle is used to determine speech intelligibility from the remaining modulation at a receiver position, using a modulated noise source at the talker position. The noise spectrum represents that of a human voice, and modulations at several frequencies represent the spoken words. Figure1 depicts the principle for a single modulation frequency.
The Speech Transmission Index (STI) is a single number quantity derived from seven noise octave frequency bands ranging from125 Hz to 8 kHz, each modulated by 14 frequencies ranging from 0.63 to 12.5 Hz. In other words, the STI is calculated from 98 modulation reduction values.
To measure STI, you can play a CD with modulated noise at the talker position and measure and analyse the resulting signal at a listener position. With this modulated noise method the source is always independent from the receiver (open loop mode, see insert), which is convenient for long distance measurements. On the other hand, due to the randomness of the source signal, measurements have to be relatively long to get reproducible results (some 30 s on average).Another disadvantage of the modulated noise method is the risk of the receiver misinterpreting background noise level fluctuations as signal modulations, hence overestimating speech intelligibility at low SNR values.
To reduce processing time, the STI was approximated using small subsets of the 98 modulation reduction values. For room acoustics a subset of nine modulation reduction values was defined, on which the so-called Room Acoustics Speech Transmission Index (RASTI) is based.
In the 1980s, Brüel & Kjær developed and introduced the Speech Transmission Meter for RASTI Measurements Type 3361.
Later on, other subsets were defined for parameters that approach the STI very well in particular situations. The STITEL was introduced for analog telecommunication systems, while fairly recently the STIPA was introduced for public address systems.
STI From Impulse Responses
DIRAC measures room acoustic impulse responses from which many parameters are calculated . While an impulse response is defined as the response of a system to an impulsive signal, DIRAC also derives impulse responses indirectly through deconvolution using other stimuli, such as sine sweeps or MLS (Maximum Length Sequence) signals. These non-impulsive deterministic signals allow the use of loudspeaker sources, making the measurements highly reproducible at the shortest measuring times (some 5 s for common speech intelligibility measurements). From the impulse responses, modulation reduction values are calculated according to Schroeder  in negligible time, so in DIRAC all speech intelligibility parameters are always available.
A disadvantage of the deconvolution technique would be the requirement of a closed loop configuration or a limited sample rate mismatch in open loop situations. To overcome this problem, DIRAC 4.0 provides sample rate error correction. This avoids the need for an interconnection between source and receiver, even when using MLS signals. It is also possible to apply pre-averaging to reduce the impulse response noise level.
Long distance speech intelligibility measurement soften require an open loop measurement configuration. This is, for instance, the case for railway stations, where the source may be located in an announcer’s booth in one city, while the receiver is a microphone connected to a PC on a platform in another city. With sweep signals, it is indeed possible to use a separate CD if the speed error between CD player and response recording device is not too large .However, sweep signals sound quite obtrusive, which may be a problem if the measurements are planned at night to avoid background noise. In this case, DIRAC 4.0 will solve the conflict, allowing open loop MLS measurements.
DIRAC provides several ways to take the background noise into account. You can measure the speech intelligibility including or excluding the actual background noise. The impact of the background noise can be investigated by mixing a recording of it into the impulse response. Or you can enter a background noise level spectrum.
In DIRAC 4.0 you can also enter the signal to-noise ratio per octave frequency band. Although you then have to obtain the voice signal level as well as the background noise level per octave band, this method appears to be most convenient, as the impulse response measurement itself is much easier to perform. Unlike other methods, no level calibrations are required, the source signal level can be maximized and pre-averaging is allowed. This gives the highest INR values, hence the most reliable system impulse response measurement results.
A typical speech intelligibility measurement can be carried out as follows:
- A stimulus is played from an audio CD orMP3 file (see insert) at the talker position. Esweepsare more robust, while MLS signalssound less obtrusive.
- The responses are recorded in DIRAC, either directly for immediate results or through a hand-held sound recorder for higher portability.
- The received SNR octave frequency spectra are obtained by measurements, from standards or through estimations.
- Back at the office, the SNR values are entered in DIRAC and the STI is calculated for each receiver position.