Daniel Reisberg. Encyclopedia of Perception. Editor: E Bruce Goldstein. Sage Publications, 2010.
Auditory imagery refers to the common experience in which people report that they “hear” a voice, melody, or other sound, in their “mind’s ear,” all in the absence of an actual acoustic stimulus. In some cases, this experience is deliberate (and so someone can, if he or she chooses, try to imagine what his or her mother’s voice sounds like, or what a particular musical performance sounds like). In other cases, the image arises spontaneously, and, indeed, people sometimes complain that they cannot avoid hearing a melody (or a snippet of a melody) over and over—a maddening experience sometimes given the striking label of “having an ear-worm.”
Auditory images are not hallucinations—people experiencing the images can tell that the images are “in their head,” and not a real sound. Nonetheless, the experience of “hearing” the imagined sound does resemble the experience of hearing an actual sound. Subjectively, the imagined sound seems to have a clear pitch, duration, and timbre, just as an actual sound would. Functionally, the image seems to provide direct information about these attributes, suggesting that the image truly does depict the sound, rather than merely describing or referring to the sound. This entry describes imagined pitch, duration, and timbre; differences between sounds and auditory images; enacted auditory images; the neural substrate of auditory imagery; and memory for enormously familiar sounds.
Imagined Pitch and Imagined Duration
Research on auditory imagery has taken several different paths. One line of research has sought to confirm the subjective sense that auditory images do directly represent a sound’s pitch. One series of experiments, for example, asked participants to imagine a specific pitch, and then to detect a faint tone either of the same pitch or different. Participants’ performance was better when the tone they were trying to detect was the same pitch as the one they were imagining—suggesting both that the image had accurately represented the pitch and that this imagined pitch primed the processes of actual hearing. A different series of experiments asked participants to imagine a particular melody and then to hum the starting pitch of the melody they were thinking about. Two days later, participants returned to the lab and did the same task. Across this two-day interval, the data show remarkable consistency in the pitch that each participant hummed for each song—indicating both that participants imagined the song at a specific pitch and that they were consistent in the pitch they chose for each melody.
A different line of research confirms the subjective sense that images are stretched out in time in just the way actual sounds are. In one study, participants were given specific words from a song’s lyrics (for example, can and by from “The Star-Spangled Banner”), and had to judge whether the pitch of the note accompanying the second word (by, the seventh beat of the phrase) was higher or lower than the pitch of the note accompanying the first word (in this case, the third beat). The data showed that the time needed to make this judgment increased in a regular fashion if the first note was further from the song’s start, and also if the second note was further from the first. It would seem, then, that participants perform this task by “playing” the melody “in their heads,” starting with the melody’s actual start—and the more notes they have to “play” to make their judgment, the more time needed.
Imagined Timbre
Still other research confirms that auditory images accurately depict a sound’s timbre (i.e., the quality of sound, with other properties, such as pitch or loudness, held constant, that distinguishes one sound source from another; for example, the quality that distinguishes a flute and a clarinet each playing the same pitch). For example, in one study, some participants had to rate the similarities of a series of actually presented sounds that differed in timbre. (In one version of the procedure, the sounds were those of orchestral instruments; in another version, the sounds were environmental sounds.) Other participants had to imagine these sounds, and also rated the similarities between the sounds. The statistical summary of these data relied on a procedure termed multidimensional scaling. In this procedure, the individual sounds are represented by distinct points in space, and the similarity ratings are interpreted as “distances” among these points; the analysis asks how the sounds must be positioned to create the complex pattern of “distances,” from one sound to the next, revealed in participants’ pair-wise ratings.
The multidimensional space produced by this analysis, based on the similarity ratings for the imagined sounds, closely resembled the space based on the ratings for the actual sounds, indicating that all the pair-wise relationships between sounds are well preserved in imagery, which implies in turn that the timbres were represented in a fashion that is at least consistent and, more strongly, true to the actual sounds.
Differences between Sounds and Auditory Images
Other results, however, suggest some complications in the comparison between imagined sounds and actual sounds. For example, actual sounds necessarily have some specific loudness; findings suggest, however, that auditory images may be indeterminate in their loudness. In addition, auditory images may not be purely “auditory,” because some data suggest that imagination may often represent complex, multimodal events, containing both auditory and visual aspects. As a result, some judgments about auditory images may be guided by information about other aspects of the represented events. Thus, for example, participants imagining familiar voices may create an image that also depicts the familiar face of the person speaking, and judgments about the voice—such as whether it is similar to some other voice—may be shaped by information about the visual qualities of the face.
Auditory images are also distinct from sounds in another way: Sounds exist in a fashion that is independent of any perceiver’s interpretation—how the perceiver parses the sound-stream (i.e., determines where one sound stops and the next begins, and decides which sounds are parts of a larger acoustic event), or interprets its meaning. Auditory images, in contrast, seem to exist only in the context of the perceiver’s interpretation; according to some authors, the image is accompanied by a perceptual reference frame that specifies how the sound is organized and understood. As a result of this reference frame, the meaning of the imaged sound, for the person holding the image, may be quite specific and relatively rigid. In one study, for example, some participants heard the word stress being uttered again and again and again; within a few seconds, their perception shifted, so they heard the sequence first as repetitions of stress, then of rest, then of tress—reflecting apparently spontaneous shifts in how the sound stream was parsed. If these repetitions of stress were imagined, however, the parsing of the stream was apparently specified within the perceptual reference frame that accompanied the image, and so these shifts in interpretation did not occur, and the (imagined) sequence was understood as a rigidly unchanging repetition of the initial word, stress.
Enacted Auditory Images
In some settings, however, participants seem able to escape the limits set by the perceptual reference frame. Specifically, in some tasks, participants seem spontaneously to supplement an imagined stimulus with silent subvocalization (and so, in addition to imagining the sounds of “stress, stress, stress,” they might silently mime the speaking of this sequence). This combination of imagery plus subvocalization produces what some authors call enacted imagery, and enacted images (unlike “pure” images) seem readily re-interpreted (and so enacted images of “stress-stress-stress” are easily re-parsed).
The role of enactment is evident in a variety of other tasks that rely on auditory imagery. In one study, participants were given letter and number strings, and had to decide what these strings would sound like if pronounced aloud. (X-T-C, for example, would be ecstasy; N-L-S-S would be analysis.) Participants could easily do this task if given no other constraints. If, however, participants were blocked from subvocalizing (e.g., by asking them, silently and repeatedly, to mouth “ta-ta-ta” while doing the task), their performance dropped markedly. If participants were in a noisy environment (so that they couldn’t “hear” what they were saying to themselves), performance was also quite low. It appears, then, that participants perform this task, in essence, by talking to themselves, and then “listening” to what they have (silently) pronounced. Put differently, this task, apparently relying on auditory imagery, seems instead to rely on a mix of auditory imagery and motor activity (the enactment), and if either the audition or the motor action is prevented, performance drops.
Neural Substrate of Auditory Imagery
Still another line of research has examined the brain mechanisms that support (and shape) auditory imagery. Both positron emission tomography and functional magnetic resonance imaging studies indicate that auditory imagery for familiar melodies depends on activation in the right auditory association cortex and activation in the frontal cortex. Tasks requiring judgments about musical timbres seem to activate both primary and secondary auditory areas, especially in the right hemisphere. These studies indicate considerable neural overlap between the brain areas needed for auditory imagery and those needed for the actual hearing of overt sounds (a pattern that parallels the findings for visual imagery and vision), underscoring the strong resemblance between imagery and perception. More specifically, we can see that imagery and perception resemble each other subjectively (images “feel like” hearing), functionally (e.g., in the information they make prominent), and biologically. In addition, activation of the supplementary motor area (SMA) also appears to be involved in the generation of auditory imagery, again suggesting a role for motor codes in auditory imagery.
Memory for Enormously Familiar Sounds
Finally, yet another line of research has examined imagery for materials that are enormously familiar—such as a favorite song that someone has heard countless times. Evidence suggests that when someone imagines this sort of familiar song, the person’s imagination is remarkably faithful to the original: The song is imagined in the same key as the original, and probably in the right octave, and at essentially the right tempo. Apparently, then, the person’s memory for this familiar auditory event is quite accurate, and the reproduction of the melody, in imagery, preserves the acoustic qualities of the original.
Even with these diverse lines of research, however, many questions about auditory imagery remain; indeed, far less is known about auditory imagery than about visual imagery. For example, subjective reports suggest that people differ in how rich or detailed their auditory images are, and in how often they experience spontaneous images. Little is known about these points, however, including the fundamental issue of whether these differences can be taken at face value, or whether, instead, people are quite uniform in their auditory imagery and differ only in how they describe their images. Or, as a different example, little is known about how (or when, or whether) people use their auditory images. There are many indications that visual images are often useful (as aids to memory, or to problem solving); the history of scientific discovery or of artistic innovation suggests many steps forward inspired by (or created through) visual images. Whether a similarly rich set of functions can be documented for auditory imagery remains among the many points in need of further research.