which according to Dudley and Tarnoczy (1950) had variable pitch
and sang 'God Save the Queen'.
But the next real step forward was Dudley's Voder (Dudley et
al. 1939), an offshoot of his Vocoder, described earlier. The
ten filters of the Vocoder synthesizer were widened to cover the
range 0-7500 Hz and a set of manual controls was supplied. The
output amplitudes of the filters were controlled from a keyboard;
a wrist bar selected buzz or hiss excitation and a foot-pedal
controlled Fo. There were also special keys to generate automatically
the sequence of closure and release required for stops. A year or
more of training was required before an operator could produce
intelligible speech, and each utterance had to be carefully rehearsed.
The Voder was demonstrated successfully at the 1939 New York World's
Fair and the 1940 San Francisco World's Fair.
There were two important differences between Dudley's approach to
speech synthesis and von Kempelen's. First, Dudley's Voder was an
electrical rather than a mechanical simulation, so that the
acoustic properties of synthesizer components were reasonably
predictable and design changes could be readily made. Second, the
Voder simulated acoustic properties of speech whereas von Kempelen's
simulated articulatory properties as well. Dudley's model made it
easier to improve the rendition of particular speech sounds, but
made a weaker claim about the nature of speech. However, Dudley's
system had one important feature in common with von Kempelen's: a
human operator was used, and the rules for synthesis were part of
his skill.
The human operator disappeared from speech synthesis as an indirect
result of R. K. Potter's invention of the sound spectrograph during
World War II (Koenig et al. 1946). After the War, the spectrograph
opened the way to extensive research in acoustic phonetics because
it made it easy to observe the correspondence between speech sounds
and events in the acoustic spectrum, notably formant movements. The
spectrograph also suggested a new way of synthesizing speech:
'playing back' a spectrogram. Potter himself built a playback
synthesizer (Young 1948); Cooper (1950) developed a research
version, the Pattern Playback, which is still in use at Haskins
Laboratories. In the Pattern Playback, an optical representation
of an excitation spectrum with 50 harmonics and Fo at 120 Hz is
shaped by a spectrographic pattern painted on a moving, transparent
acetate belt; and this optical representation is then converted to
an acoustic signal. Thus the synthesis of an utterance is not a
transient performance but is controlled by a pre-planned pattern,
and can be repeated. Moreover, the close correspondence between
the output of the analyzing tool (the spectrograph) and the input
to the synthesizing tool (the Playback) is convenient experimentally
and of great value conceptually.
The Haskins investigators used the Playback to study the psychology
of speech perception and to accumulate a body of knowledge about
the 'speech cues' (Liberman et al. 1967). Experienced users of the
Playback, for example the late Pierre Delattre, could readily paint
intelligible utterances; like the operators of von Kemplen's
|