Ńňóäîďĺäč˙
rus | ua | other

Home Random lecture






ABSTRACT


Date: 2015-10-07; view: 581.


Speaking styles and Phonetic variation

Term paper autumn 2004

Speech Technology level 1

Ulla Bjursäter (ullabj@ling.su.se)

Department of Linguistics, Stockholm university

Human speech is a very dynamic phenomenon with nearly

endless forms of variations. Different factors affect the speech

signal; this paper aims at giving a short overview of different

speaking styles and phonetic variations that affects human

speech production and perception. Various topics are

mentioned, such as age and gender, sound symbolism,

speaking styles, emotions, universal features, voice quality

and prosodic aspects. A conclusion can be made on the

importance of integrating all these aspects in the research

concerning human-computer interaction.

Introduction

When we produce speech, the vocal organs do not move from sound to sound in a series

of separate steps. Instead, speech is a continuously varying process, and sounds

continually influence their neighbours (Crystal, 1997). We do not discern speech as

discrete units, all the units are integrated and we perceive a holistic representation of the

speech signal.

Speech has to be easy to produce but also easy to distinguish, to maintain distinctive

linguistic/phonetic aspects. When we talk, the speech sound can be assimilated and

reduced without greater misunderstandings because of the redundancy and context. Some

aspects that are of importance are different prosodic aspects and paralinguistic

phenomena. Prosody plays a very important role in human-computer interaction as well

as in the communication between human beings (Hirschberg, 2002). The more we know

of a language, the more we understand, despite great reductions in the speech signal. If

we as listeners have limited knowledge of a language, we depend on the speaker leaving

as much information as possible in the speech signal (Lundström-Holmberg & af

Trampe, 1987).

Some of the difficulties in speech perception and subsequently also in automatic speech

recognition, are the great variability of speech production and perception. Some

important factors of the comprehensibility are the speaker's age, gender, anatomy and

also the dialect, idiolect and sociolect of the speaker. The speech can be affected by the

emotional status and health condition and also if the person is much stressed up. The

speaking style is also important for the intelligibility. Speakers tend to vary their

articulatory speech output from hyper- to hypospeech (Lindblom, 1990). Clear speech is

produced in an effort to be highly intelligible, and is relatively easy perceived by the

listener. Speech can also be produced on a spontaneous, conversational basis, with

Page 2

reductions, non-grammatical sentences and hesitations that may affect the intelligibility

of the utterance. A listener can have difficulties understanding what is said if they listen

to a non-native language or if they don't know the person that is speaking. Bad hearing

and different age problems can also affect the comprehension. When people grow older,

the speaking habits are more set and it can be difficult to accept new lexical words and

the change of meaning in familiar lexical words.

This paper aims at giving a short overview of different speaking styles and phonetic

variations that affects our speech production and perception.

Speaking styles

The ways we express ourselves vary from one situation to next, depending on the context

and speaker intentions. A speaker might vary his or hers speaking style depending on the

listener and the situation, using different expressions, pronunciations and tone of voice.

The manner of pronunciation is quite strict when a more formal speaking style is used;

the talker makes an effort to be easily understood by modifying the articulation to make

speech slower and acoustically more distinctive (Kent & Read, 1992). These are also the

characteristics of clear speech. These forms are often used when reading out loud. There

is, of course, a great difference between reading a formal text out loud and reading fairy-

tales to children. In reading a formal text out loud in public, the need of comprehension

influences both articulation and prosodic aspects, while reading a fairy-tale there is a

different demand on the speaking style, as it addresses a child or a group of children and

the read speech may often include passages of spontaneous speech.

There is quite a difference between public speech and spontaneous speech. When

speaking in public, more precise articulations are produced in contrast to spontaneous

speech. When the speaker is more comfortable in a conversation, the speaking style turns

into a more casual one, with simplified words and phrases. This is when we start to

produce reductions, assimilations and coarticulations. Speech rate is increased compared

to clear speech and so the quantity of reductions also increases. The reductions are

dependent on where the stress lies in the word. Stressed words or syllables are not

reduced as much as an unstressed and they are usually not completely reduced, as might

be the case with unstressed syllables or words (Lundström-Holmberg & af Trampe,

1987).

Humans tend to have a certain way of speaking to their infants. The syntax is simpler and

the prosodic organization seems to be almost universal with a few culturally based

variations, with higher pitch, slower tempo and enhanced intonation as a standard. Even

speaking with a whispering voice is not uncommon when communicating with young

infants (Fernald & Simon, 1984).

We gradually change our speaking style as the child grows and adjust the communication

according to the child's linguistic/developmental level. When the children develop into

youths, they often try to find new ways of expressing themselves as a form of

differentiation from their elders and sometimes also from other groups of people in their

own generation, thus forming their own idiolects and sociolect.

Every speaker has a personal idiolect that differs from people with the same dialect. We

use our voice and speaking style, intentionally or unintentionally, to mark our

personality. Our sociolect is a form of social dialect, it tells people who and what we are

Page 3

in more or less hierarchically ordered social groups. It gives the information of social

position and educational level. There is a cultural variation in the norms and rules for

accepted behaviour. People usually speak to higher-status people in the respectful way

used when speaking to strangers, while lower-status people are addressed in the more

intimate, first–name way similar to when speaking to friends. The way we address other

people indicates our social distance and social status in relation to the other person

(Myers, 2002).

Sound symbolism

There are some speech sounds that are conscious, non-arbitrary and iconized forms of

speech. This can be termed “sound symbolism” and is a direct linking between speech

sound and meaning (Hinton, Nichols & Ohala, 1994). This depicting phenomenon gives

indirect associations and a universal pattern can be detected, with a common base that is

realized in different language specific patterns. In “mama” and “papa”, imitations may

originate from an infant's spontaneous vocalization of CV-syllables (Jacobson, 1962).

Humans produce imitative animal sounds in an onomatopoetic way (pip pip), imitations

of sounds from nature (swisch), and imitations of sounds originating from the human

being (gurgle, mumble, babble) (Hinton, Nichols & Ohala, 1994; Traunmüller, 2000).

Emotions

Our voice is greatly affected by our emotional reactions. The emotional status of a

speaker can be revealed by several acoustical aspects. Mozziconacci (1995) studied pitch

variations and different emotions (neutral, joy, boredom, anger, sadness, fear,

indignation) in speech. She found that measurements of this single aspect were not

enough to establish the speaker's emotional status. There are other studies investigating

different aspects in emotional speech, like spectral and temporal changes (Kienast &

Sendlmeier, 2000). Table 1 contains a listing of the acoustical correlates of the four

emotions “happy”, “sad”, “angry” and “afraid” from observations of some prosodic

characteristics of speech, in vowels and fricatives in relation to neutral speech (Murray &

Arnott, 1993).

Table 1: Acoustical correlates of the emotions happy, sad, angry and afraid.

Speech rate

Variation of

fundamental

frequency

Intensity

Vowels

Fricatives

Happy

Varying

speech rate

Big F

variation

Elevated

Raised F

, a

little raised F

Spectral balance, a

lot of energy at high

frequencies

Sad

(Low F

-variation)

+ reductions and

assimilations

Low F

variation

Low

Less periphery

formant

frequencies

Lower spectral

balance compared to

neutral speech

Angry

Increased speech

rate (often)

Elevated F

variation

High (especially

at stress)

Raised F

, little

raised F

Spectral balance, a

lot of energy at high

frequencies

Afraid

Increased speech

rate (often)

High pitch, lower

F

variation

High intensity

(+jitter, at

extreme fear)

Less periphery

formant

frequencies

Spectral balance, a

lot of energy at high

frequencies

Page 4

Acoustic analyses indicate that smiling raises the fundamental and formant frequencies

for all speakers, and also the amplitude and/or duration for some speakers. The elevated

formant frequencies might be a side effect from altering the vocal tract by spreading the

lips and drawing the corners of the mouth backwards, a procedure that shortens the vocal

tract (Tartter, 1980).

According to Künzel (2000) the general increase of stress and nervousness, like for

example during a crime, tend to raise his or hers F

values during the “crime situation”.

It is important to mention that there are of course culture- and language-specific

differences in the listener's interpretation of the emotion (Scherer, Banse & Wallbott,

2001). It can be hard to correctly identify different emotions because a variety of

interacting variables may manifest themselves in a complex way.

Prosodic aspects

Prosody has, as mentioned earlier, a very important role in human-computer interaction

as well as in the communication between human beings (Hirschberg, 2002). Prosodic

aspects of a language are a collection of linguistic/phonetic effects, like tonal, temporal

and dynamic aspects. The most significant prosodic effects in a language's intonation

system are provided by the linguistic use of pitch. Different levels of pitch (tones) are

used in particular sequences to convey a wide range of meanings. For instance, the

difference between a falling and a rising pitch pattern can express the contrast between

“stating” and “questioning”. Duration is a another prosodic parameter. Variations in the

temporal rate at which syllables, words and sentences are produced convey different

kinds of meaning. In several languages, a sentence spoken with increased tempo conveys

urgency; while slower speed conveys deliberation or emphasis (Crystal, 1997). Another

significant prosodic aspect is intensity, which is used to convey differences of emotional

aspects, such as the increased volume usually associated with anger. Intensity is also used

to express divergence in lexical aspects as in terms of the contrast heard in the different

syllables in a word. Syllabic intensity is usually referred to as stress, but the term

“accent” is often used (accented vs. unaccented) referring to the way prominence

manifests in loudness as well as pitch (Crystal, 1997).

Tonal aspects

Intonation, variation in tone, present a variety of different functions. One obvious

function is to express emotions. Intonation co-varies with other prosodic and

paralinguistic aspects to mark all kinds of emotional expression (Crystal, 1997). An

expressive intonation pattern can also be used in a synesthetical

way, like in the use of a

deep voice and vowel lengthening in speaking of large objects as in “It was a bi-i-ig fish”

(Hinton, Nichols & Ohala, 1994). Intonation also plays an important role in the marking

of grammatical contrast. Pitch contours break up utterances, which facilitates

comprehension. Statements and questions or positive or negative intentions may be

signalled by intonation. Intonation conveys information structure with the intonation

prominence; there is a big difference of meaning in the way we say “I like fish”, the

prominence can land on either “I”, “like” or “fish”, meaning different things (Crystal,

1997). A language can also contain minimal pairs that contrast only in word accent. In

Syntesthetic sound symbolism is the process whereby certain speech sounds are chosen to consistently

represent visual, tactile, or properties of objects, such as size and shapes (Hinton, Nichols & Ohala, 1994).

Page 5

Swedish, for example, the sentence “Den här tomten är bra” means either “This site is

fine” or “This goblin is fine”, depending on the accentual pattern of “tomten” (Crystal,

1997). Monotonous intonation can be a sign of language retardation in children

(Nettelbladt, 1997).

Intonation can also be used distinctively in read speech. Textual information is divided in

larger stretches of paragraphs, when you read a text out loud a distinctive melodic shape

may give information. When a new item is read, the pitch level rises only to gradually

descend as you continue to read on. The use of intonation also can help organize

language into units that are easier to recognize and memorize, like listening to a long

sequence of numbers. This is an aspect that may be missing in some cases of language

disorder. Intonation can have a significant function as a marker of personal identity in an

“indexical” function, as it can help to identify people as belonging to different social

groups and various occupations (Crystal, 1997).

Temporal aspects

Temporal aspects may also reflect various attitudes and emotions of the speaker

(Lundström-Holmberg & af Trampe, 1987). Temporal aspects in prosody have two

functions; quantity and juncture. Several kinds of meaning are conveyed by variations in

the temporal rate at which syllables, words and sentences are produced. In final

lengthening, the duration in last part of an utterance is extended as an indication of the

utterance coming to an end. In many languages, variations of the length of the segment

are used to make a difference in meaning, such as in Swedish where the use of quantity

creates long and short vowels. There are also long and short consonants, depending on

the quantity of the vowel. If the vowel is long - the consonant is short and vice versa

(V:C / VC:). Another temporal function is juncture, that might manifest itself through

audible pauses, but more often it is just short closures of the air flow or extensions of

certain segments (I scream vs ice cream) (Crystal, 1997).

Intensity

Production, acoustics and perception

Intensity is dependent on variation in vocal effort controlled by the respiratory muscles.

Syllabic and phrase intensity is usually referred to as stress, but the term accent is often

used (Crystal, 1997). In Finnish, main stress is fixed on the first syllable, while in French

main stress always fall on the last syllable. Other languages, like Swedish and English,

might have stress that fall differently depending on whether the word is a noun (‘import,

pervert) or a verb (im'port, per'vert). Stress may also convey a difference of meaning on

phrase level (‘sleep in or sleep ‘in).

Intra-speaker variations in vocal effort creates various degrees of loud and soft speech.

This affects the production and, subsequently, the acoustics. Articulation changes when

intensity is raised. In vowel production, the opening of the jaw increases and the lips and

the tongues movements are compensating with necessary, extreme movements. The

duration also increases in relation to the openness of the vowel; the more open the jaw-

the longer the vowel duration. With consonants, the place and manner of articulation

generally remains unchanged, but hypertension in the muscle tonus may occur. The vocal

folds tense and a higher subglottal pressure occur. This also influences vocal fry, which

decreases with increasing intensity (Shulman, 1989). Increased rate of articulation

Page 6

produces shorter consonants and make stressed vowels longer, the total segmental

duration to remains practically unchanged due to duration compensation. Adults take

longer pauses as they need more air in this production of amplified intensity (Shulman,

1989).

Acoustically, increased articulation effort affects the fundamental frequency and formant

frequencies. The F

value increases as a function of increased vocal intensity, the formant

frequencies (especially F

) shifts upwards, thus facilitating a correct phonetic identity.

Independent of the vowel's degree of opening, the phonetic vowel identity remains

unchanged when the tonotopical distance between F

and F

(in bark) is constant.

Perceptually, the formant positions are evaluated in relation to each other and to F

0.

The

listener also catches the information of a voiced/voiceless consonant by information from

F

. Increased intensity also gives increased spectral emphasis at the higher formant

frequencies (Traunmüller, 1988).

Voice Quality

Paralinguistic features

Apart from the contrasts signalled by tone, tempo and intensity, languages make use of

several distinctive vocal effects, using the range of articulatory possibilities of the vocal

tract. The laryngeal, pharyngeal, oral and nasal cavities can all be used to produce “tone

of voice” which may alter the meaning of the utterance. One of the clearest examples of

paralinguistic aspects is whispered speech, which is used in many languages to add

“conspiratorial' meaning to the utterance (Crystal, 1997).

There are different dimensions of voice quality. Important voice quality factors are the

laryngeal conditions and articulation habits. Voice qualities originate from the larynx,

where the character of the speech material is produced to render different laryngeal

qualities such as vocal fry (creaky), strained voice and breathy voice (Lindblad, 1992).

Vocal fry is caused by strong, irregular, relatively low vocal fold pulses. A breathy voice

occurs when the edges of the vocal folds do not quite close when vibrating. In a falsetto

voice, the sound is produced by long, thin and tense vocal folds. It is very hard to control

intensity and pitch when using falsetto voice (Lindblad, 1992).

Table 2: Various combination possibilities of different phonation types according to

Laver (1980)

Modal

voice

Falsetto

Breathy voice Whisper

Creaky,

Vocal fry

Rough

Modal voice

x

-

+

+

+

+

Falsetto

-

x

-

+

+

+

Breathy voice

+

-

x

-

-

-

Whisper

+

+

-

x

-

-

Creaky,

Vocal fry

+

+

-

-

x

-

Rough

+

+

-

-

-

x

Page 7

Additionally, the voice gets its characteristic quality from the special shape of the

speakers' vocal tract and its adjacent cavities. A speaker usually have certain habitual

settings and gestures when moving the lips, tongue, jaw and velum. An example on

articulatory quality is nasal voice. Nasality can be used in different sociolect (e.g some

parts in Stockholm), and also in different Swedish dialects (Elert, 1997).

Even though the range of combinations of different types of phonation is vast, there are

certain types of phonation that are improbable or even impossible because of physical

limitations. Table 2 contains a description of various combinations of different types of

phonation. For example, rough and breathy voice usually exists in combination with other

types of phonation. A rough, whispering voice usually occurs in combination with modal

or falsetto voice. With a breathy voice, whisper or vocal fry can usually only occur in

combination with a modal voice (but not in combination with a falsetto voice)

(Laver, 1980).

Age and Gender

Physiology, acoustics and perception

A listener can (almost always) hear the difference between a male and a female voice, at

least in adult voices. Even a very short utterance or even a cough or laugh contains

enough information about the vocal tract and vocal folds for the listener to form an

instantaneous impression. Organic variations (like in gender differences) caused by

differences in the dimensions of the vocal tract affect all the formant frequencies and F

(Titze, 1989). This generates acoustical differences between male and female speech;

men have a fundamental frequency of about 120 Hz, while women have a F

almost one

octave higher – around 220 Hz, as a result of the anatomical difference of the vocal folds,

mass and length, between men and women. Men have thicker vocal folds, but the length

is more important as an acoustic gender-parting factor; a male have longer vocal folds

which affects the fundamental frequency. The length of the vocal tract differs, men have

about 1,5 cm longer vocal tract (ca.17 cm) than women (ca. 15,5 cm), which gives

women higher formant frequencies (Diehl, Lindblom, Hoemke & Fahey, 1996).

Male articulatory gestures consume more energy than female gestures because of the

need of larger tongue movements to reach uvula, velum or the pharyngeal wall from a

neutral position. These time-demanding, energy consuming articulatory gestures do not

quite reach the intended articulation target and thus a smaller vowel space is generated.

Female formants are more extreme than are male formants; women have a more

expanded vowel space with larger distribution in the F

–F

- space, which gives an

increased vowel contrast. This might possibly be some kind of compensation for the

higher female fundamental frequency, which gives a slight reduction in distinction, but it

can also depend on the flexibility of the female articulation organ, that renders them to

easier reach the intended articulation target (Lindblom, 1983).

There are no acoustical differences between boys and girls speech before puberty; the

difference between adult men and women mostly depends on the pharyngeal

prolongation in boys during puberty. This pharyngeal prolongation is a result of a descent

of the larynx in the vocal tract (Fitch & Giedd, 1999). Likewise, the gender differences

diminish with old age. As male hormones increases in women and female hormones

increases in men, the voices also change and sound more similar.

Page 8

An important factor is the multimodal character of communicative behavior and speech

and language processing. The influences of visual and auditory factors affect perception.

An audiovisual integration of the speech signal occurs and visual images affect the

expectations based on the listener's experience. Johnson, Strand and D'Imperio (1999)

have examined the auditory/visual effect in vowel perception and their results indicate

that listeners tend to integrate abstract information of gender with phonetic information in

speech perception. The listener uses all available clues to what they can expect; if you see

a female face you expect female formant frequencies.

Universal features

There are specific features that seem to be more or less universal. For instance, all

languages do have consonants and vowels but they have different phoneme inventories.

The languages' selected speech sounds are chosen to get enough dispersion to achieve

lexical distinction. Most languages contain the three vowels /i,a,u/

because they give a

maximal perceptual distance (Liljencrants & Lindblom, 1972).

Ohala (1983) analyzed languages' different sound patterns and looked for phonetic

universal features in an attempt of understanding the production of speech. By observing

the universal physical, phonetic characteristics in the speech mechanism, especially

aerodynamic qualities, Ohala discusses how different languages build their phoneme

inventories based on different physical and biological conditions. Voiceless stops seem to

be more frequent than voiced ones throughout the languages of the world. Voiced bilabial

stops, /b/, are more common than voiceless /p/, while there seem to be a preference for

voiceless velar stops /k/. Also, voiceless fricatives are preferred to voiced.

Languages make use of different categorical distinctions. Lisker and Abramson (1964)

measured VOT (voice onset time) in a cross-language study and found that languages

tend to use VOT as a distinctive contrast in categorizing voiced/voiceless stops. Yet

another distinction is provided by aspiration. Languages have different ways of using

aspiration as a distinctive contrast in the production of stops. This can be noticeable in

second language production, where an incorrect production of aspiration either sound

somewhat unfamiliar to the native or even can be a source of misunderstanding.

Naturally, the speech sounds have to fall within limitations of the human speech

apparatus. Lindblom (1983) points out that normal speech only uses a small part of the

potentially available gestures of articulation. Humans try to minimize the energy

consumption of the speech gestures and speakers have a universal tendency to more

hypo- than hyper-articulate, which results in coarticulations and vowel reductions

(Lindblom, 1963). Humans seem to avoid extreme articulations in speech production, but

speech is only economized to the limit of being perceptually appropriate. The speaker

strives for articulatory relief while the listener demands perceptual distance; the sounds

must be different enough for the listener to be able to separate them. Languages tend to

develop sound patterns that adapts to the biological constraints of speech production

(Lindblom, 1983). ”Easy way sounds OK” seem to be a functional way of maintaining a

balance between production and perception (Lindblom, 2000).

The International Phonetic Alphabet (IPA) SIL Doulus93

Page 9

Concluding remarks

The aim of this paper was to give a short overview of different speaking styles and

phonetic variations that affect our speech production and perception. A conclusion can be

made on the importance and problems of integrating all these aspects in the research

concerning human-computer interaction. People tend to use various forms if speech

depending of situation, from a more formal, hyper-articulate way of speaking to a more

casual, spontaneous form of hypo-articulation. The study of how different emotions

affect our voice is important in designing human-computer interactive software, as the

simulation of emotions in a synthetic voice can be used indicating “personality”, which

could influence the intelligibility of the speech and the intended message (Murray &

Arnott, 1993). Prosodic aspects play a very important role in the human-computer

interaction, though software technologies has to provide more sophisticated abilities in

both the recognition and the generation of prosodic variation to further the development

of current research (Hirschberg, 2002).

References

Crystal, D. (1997) The Cambridge Encyclopedia of Language. (2

nd

Edt) Cambridge University

Press.

Diehl, R.L., Lindblom, B., Hoemeke, K.A. & Fahey, R.P. (1996) On explaining certain male-

female differences in the phonetic realization of vowel categories. Journal of Phonetics

24, 187-208.

Elert, C-C. (1997) Allmän och svensk fonetik. (7

th

Edt) Nordstedts Förlag, Stockholm.

Fernald, A. & Simon, T. (1984) Explained Intonation Contours in Mothers' Speech to Newborns.

Developmental Psychology 20 (1), 104-113.

Fitch, W.T. & Giedd, J. (1999) Morphology and Development of the Human Vocal Tract: A

Study Using Magnetic Resonance Imaging. Journal of the Acoustical Society of America

106 (3), 1511-1522.

Hinton, L., Nichols, J. & Ohala, J. (1994) Introduction: Sound Symbolic Processes. Sound

Symbolism. Hinton, Nichols & Ohala (ed). Cambridge University Press.

Hirschberg, J. (2002) Communication and prosody: Functional aspects of prosody. Speech

Communication 36, 31-43.

Jacobson, R. (1962) “Why Mama and Papa?” in Selected Writings, (1) Phonological Studies,

538-545. The Hauge: Mouton.

Johnson, K. Strand, E.A. & D'Imperio, M. (1999) Auditory-visual integration of talker gender in

vowel perception. Journal of Phonetics 27, 359-384.

Kent, R. & Read, C. (1992) The Acoustic Analysis of Speech. Singular Publishing Group Inc. San

Diego, California.

Kienast, M. & Sendlmeier, W.F. (2000) Acoustical analyses of spectral and temporal changes in

emotional speech. Proceedings of ISCA Workshop on Speech and Emotion. Queen's

University, Belfast.

Künzel, H. (2000) Effects of voice disguise on speaking fundamental frequency. Forensic

Linguistics 7, 149-179

Laver, J. (1980) The Phonetic Description of Voice Quality. Cambridge.

Page 10

Liljencrants, J. & Lindblom, B. (1972) Numerical simulation of vowel quality systems: The role

of perceptual contrast. Language 28 (4), 839-862.

Lindblad, P. (1992) Rösten. Studentlitteratur, Lund.

Lindblom, B. (1963) Spectrographic Study of Vowel Reduction. Journal of the Acoustical Society

of America 35, 1773-1781.

Lindblom, B. (1983) Economy of Speech Gestures. The Production of Speech. P. MacNeilage

(ed) Springer, New York.

Lindblom, B. (1990) Explaining Phonetic Variation: A Sketch of the H&H Theory. Speech

Production and Speech Modelling. W.J. Hardcastle & A. Marchal (eds). 403-439.

Lindblom, B. (2000) Developmental Origins of Adult Phonology: The Interplay Between

Phonetic Emergents and the Evolutionary Adaptations of Sound Patterns. Phonetica 57,

297-314.

Lisker, L. & Abramson, A. (1964) A Cross-Language Study of Voicing in Initial Stops: Acoustic

Measurements. Word 20 (3), 384-422.

Lundström-Holmberg, E. & af Trampe, P. (1987) Elementär Fonetik. Studentlitteratur.

Mozziconacci, S., (1995) Pitch variations and emotions in speech. ICPhS 95 vol. 1, 178 – 182.

Murray, I.R. & Arnott, J.L. (1993) Toward the simulation of emotion in synthetic speech: A

review of the literature on vocal emotion. Journal of the Acoustical Society of America 93

(2), 1097-1107.

Myers, D.G. (2002) Social Psychology. (7

th

edt.) McGraw-Hill Higher Education, New York.

Nettelbladt, U. (1997) De svårförståeliga barnen – aktuell forskning om specifik språkstörning.

Från Joller till Läsning och Skrivning. R. Söderbergh (ed). Gleerups, Malmö.

Ohala, J. (1983) The Origin of Sound Patterns in Vocal Tract Constraints. The Production of

Speech. P. MacNeilage (ed) Springer, New York.

Scherer, K., Banse, R. & Wallbott, H. (2001) Emotion Inferences From Vocal Expression

Correlate Across Languages and Cultures. Journal of Cross-Cultural Psychology, 32 (1)

76-92.

Shulman, R. (1989) Articulatory dynamics of loud and normal speech. Journal of the Acoustical

Society of America 85, 295-310.

Tartter, V.C. (1980) Happy talk: Perceptual and acoustic effects of smiling on speech. Perception

and Psychophysics 27 (1) 24-27.

Titze, I.R. (1989) Physiologic and Acoustic Differences between Male and Female Voices.

Journal of the Acoustical Society of America 85, (4) 1699-1707.

Traunmüller, H. (1988) Paralinguistic Variation and Invariance in the Characteristic Frequencies

of Vowels. Phonetica 45, 1-29.

Traunmüller, H. (2000) Sound Symbolism in Deictic Words. Tongues and Texts Unlimited. Aili,

H. & af Trampe, P. (ed) Stockholm. 213-234.

 


<== previous lecture | next lecture ==>
Phonetic terms | Dialogue 10. THE WONDERS OF THE MODERN WORLD
lektsiopedia.org - 2013 ăîä. | Page generation: 0.003 s.