Humans Evolved to Play Music

A gesture as simple as holding the violin is intimately connected to our biology.
Person playing the violin in direct sunlight
Photograph: Jordi Salas/Getty Images

I first held a violin in my late forties. Placing it under my chin, I let go an impious expletive, astonished by the instrument’s connection to mammalian evolution. In my ignorance, I had not realized that violinists not only tuck instruments against their necks, but they also gently press them against their lower jawbones. Twenty‑five years of teaching biology primed me, or perhaps produced a strange bias in me, to experience holding the instrument as a zoological wonder. Under the jaw, only skin covers the bone. The fleshiness of our cheeks and the chewing muscle of the jaw start higher, leaving the bottom edge open. Sound flows through air, of course, but waves also stream from the violin’s body, through the chin rest, directly to the jawbone and thence into our skull and inner ears.

Music from an instrument pressed into our jaw: These sounds take us directly back to the dawn of mammalian hearing and beyond. Violinists and violists transport their bodies—and listeners along with them—into the deep past of our identity as mammals, an atavistic recapitulation of evolution.

The first vertebrate animals to crawl onto land were relatives of the modern lungfish. Over 30 million years, starting 375 million years ago, these animals turned fleshy fins into limbs with digits and air‑sucking bladders into lungs. In water, the inner ear and the lateral line system on fish’s skin detected pressure waves and the motion of water molecules. But on land the lateral line system was useless. Sound waves in air bounced off the solid bodies of animals, instead of flowing into them as they did underwater. 

In water, these animals were immersed in sound. On land, they were mostly deaf. Mostly deaf, but not totally. The first land vertebrates inherited from their fishy forebears inner ears, fluid‑filled sacs or tubes filled with sensitive hair cells for balance and hearing. Unlike the elongate, coiled tubes in our inner ears, these early versions were stubby and populated only with cells sensitive to low‑frequency sounds. Loud sounds in air—the growl of thunder or crash of a falling tree—would have been powerful enough to penetrate the skull and stimulate the inner ear. Quieter sounds—footfalls, wind‑stirred tree movements, the motions of companions—arrived not in air, but up from the ground, through bone. The jaws and finlike legs of these first terrestrial vertebrates served as bony pathways from the outside world to the inner ear.

One bone became particularly useful as a hearing device, the hyomandibular bone, a strut that, in fish, controls the gills and gill flaps. In the first land vertebrates, the bone jutted downward, toward the ground, and ran upward deep into the head, connecting to the bony capsule around the ear. Over time, freed from its role as a regulator of gills, the hyomandibula took on a new role as a conduit for sound, evolving into the stapes, the middle ear bone now found in all land vertebrates (save for a few frogs that secondarily lost the stapes). At first, the stapes was a stout shaft, both conveying groundborne vibrations to the ear and strengthening the skull. Later, it connected to the newly evolved eardrum and became a slender rod. We now hear, in part, with the help of a repurposed fish gill bone.

After the evolution of the stapes, innovations in hearing unfolded independently in multiple vertebrate groups, each taking its own path, but all using some form of eardrum and middle ear bones to transmit sounds in air to the fluid‑filled inner ear. The amphibians, turtles, lizards, and birds each came up with their own arrangements, all using the stapes as a single middle ear bone. Mammals took a more elaborate route. Two bones from the lower jaw migrated to the middle ear and joined the stapes, forming a chain of three bones. This triplet of middle ear bones gives mammals sensitive hearing compared with many other land vertebrates, especially in the high frequencies. For early mammals, palm‑sized creatures living 200 million to 100 million years ago, a sensitivity to high‑pitched sounds would have revealed the presence of singing crickets and the rustles of other small prey, giving them an advantage in the search for food. But before this, in the 150 million years between their emergence onto land and their evolution of the mammalian middle ear, our ancestors remained deaf to the sounds of insects and other high frequencies, just as we, today, cannot hear the calls and songs of “ultrasonic” bats, mice, and singing insects.

The evolutionary transformation of parts of the lower jaw of premammalian reptiles into the modern mammal middle ear is recorded in a sequence of fossilized bones, stony memories from hundreds of millions of years ago. As embryos, we each also relive the journey. During our development, our lower jaw first appears as a string of interconnected small bones. But these bones do not fuse into a single lower jaw as they do in living or ancient reptiles. Instead, the connections among them dissolve. One bone becomes the malleus of the middle ear. Another becomes the incus bone that connects the malleus to the stapes. A third curls into the ring that holds our eardrum. And one elongates into our single lower jawbone.

When I lifted the violin to my neck and felt its touch on my jawbone, my mind filled with imaginings of ancient vertebrates. These ancestors heard through their lower jaws as vibrations flowed from the ground, to jaw and gill bones, to the inner ear. The violin drew me into a reenactment of this pivotal moment in the evolution of hearing, without the indignity of prostrating myself. High art meets deep time? Not in my incapable hands, but certainly in the artistry of accomplished musicians.

Bone conduction of sound gives violinists a different experience of sound than their listeners. Most of the sound flows through air, joining player and audience. But sound waves also flow up through the jaw, turning the bones of the head into resonators that fatten the experience, especially for low notes. These vibrations also run down through the shoulder, into the chest. Playing the violin without such bodily contact—resting it on a spongy cloth against the shoulder and forgoing jaw contact—yields an insipid experience. The instrument feels distant, even though it sounds loudly in our ears.

The experience of music, then, embeds us not only in the ecology and history of the world, but in the particular qualities of the human body. One of these qualities is our special human ability to wield tools and craft ivory, wood, metal, and other earthly materials into instruments. Another is the musicians’ ability to animate these mergers within listeners’ bodies, through sound. Music incarnates us, literally “making us flesh.”

Might the internal, subjective experience of human music also ground us in the earth and unite us with the experiences of other species? Our culture mostly says, no, music is uniquely human. Philosopher of music Andrew Kania tells us, for example, that the vocalizations of “non‑human animals” are “examples of organized sound that are not music.” Further, because singing creatures like birds and whales “do not have the capacity to improvise or invent new melodies or rhythms,” they “should no more count as music than the yowling of cats.” Musicologist Irwin Godt concurs, writing that “the birds and bees may make pretty sounds . . . but despite the effusions of the poets, such sounds are not music by definition It makes no sense to muddy the waters with non‑human sounds. This is a fundamental axiom.” When I step outside the walls of the performance hall or seminar room, spaces whose “fundamental axiom” is the sensory exclusion of the beyond‑human world, these ideas seem to me hard to defend.

If music is sensitivity and responsiveness to the vibratory energies of the world, then it dates back nearly 4 billion years to the first cells. When sound moves us, we are also united to bacteria and protists. Indeed, the cellular basis of hearing in humans is rooted in the same structures, cilia, possessed by many single‑celled creatures, a fundamental property of much cellular life.

If music is sonic communication from one being to another, using elements that are ordered and repetitive, then music started with the insects, 300 million years ago, then flourished and diversified in other animal groups, especially other arthropods and the vertebrates. From the katydids animating the night air in a city park, to the songbirds that greet the dawn, to the thumping fish and caroling whales of the oceans, to the musical works of humans, animal sound combines themes and variations, reiteration and hierarchical structure. To argue that music is sound organized only by “persons” and not “unthinking Nature,” as philosopher Jerrold Levinson has done, is akin to claiming that tools are material objects modified for particular use only by humans, thereby excluding the artisanal achievements of nonhumans like chimpanzees and crows. If personhood and the ability to think are the criteria by which to judge whether a sound is music, then music is a multiplicity encompassing the many forms of personhood and cognition in the living world. Erecting a human barrier around music in this way is artificial, not a reflection of the diversity of sound making and animal intelligences in the world.

If music is organized sound whose intent is wholly or partly to evoke aesthetic or emotional responses in listeners, as Godt and others claim, then the sounds of nonhuman animals must surely be included. This criterion aims, in part, to separate music from speech or emotional cries, a challenging line to draw even in humans where lyrical prose and poetry erode the division from one side and highly intellectualized forms of music chip away at the other. All animals live within their own subjective experiences of the world. Nervous systems are diverse, and so the aesthetics and emotions that are part of these experiences no doubt take on multifarious textures across the animal kingdom. To deny that other animals have such subjective experiences is to ignore both our intuitions from lived experience (we understand that our pet dog is not a Cartesian machine) and the last 50 years of research into neurobiology, which now can map within the brains of nonhuman animals the sites from which emerge intention, motivation, thought, emotion, and even sensory consciousness. Laboratory and field studies show that nonhuman animals, from insects to birds, integrate sensory information with memory, hormonal states, inherited predispositions, and, in some, cultural preferences, producing changes in their physiology and behavior. We experience this rich confluence as aesthetics, emotion, and thought. All the biological evidence to date suggests that nonhuman animals do the same, each in their own way. For the cat, then, “yowling” is music if it stimulates aesthetic reactions in feline listeners. The subjective responses of other cats are the relevant criteria by which to judge the sound’s musicality.

Sonic evolution without aesthetic experience has little diversifying power. Aesthetic definitions of music, then, are biologically pluralistic, unless we make the unsupported and improbable assumption that experiences of beauty are uniquely human.

If music is sound whose meaning and aesthetic value emerge from culture, and whose form changes through time by innovations that arise from creativity, then we share music with other vocal learners, especially whales and birds. In these species, as in humans, the reaction of individuals to sounds is largely mediated by social learning and culture. When a sparrow hears a mate or rival sing, the bird’s response depends on what it has learned of local sonic customs that have been passed down culturally. When a whale calls, it reveals to others its individual identity, clan affiliation, and, in some species, whether it is up to date on the latest song variants. These responses are aesthetic: subjective evaluation of sensory experience in the context of culture. Often this results in richly textured patterns of sonic variations across the species’ range. Cultural evolution in these species also changes sound through time, at a pace that is swift in some and leisurely in others, depending on their social dynamics. New sonic variations arise through diverse means: selecting sounds best suited to changing social and physical context, mimicking and modifying sounds from other individuals and species, and the invention of entirely new twists on old patterns. These diverse forms of animal music combine tradition and innovation, just as human music does.

If music is sound produced through modification of materials to make instruments and performance spaces in which to listen, then humans are nearly unique. Other animals use materials external to their bodies such as nibbled leaves or shaped burrows to make or amplify sounds, but none make specially modified sound‑producing tools, even the skilled toolmaking primates and birds. Music, then, separates us from other beings in the sophistication of our tools and architecture, but not in other regards. We are, as other musical animals are, sensing, feeling, thinking, and innovating beings, but we make our music with tools in a built environment of unique complexity and specialization.

As human musical sounds flow into us and move us, we are embedded in nested forms of music: the experience of themes and variations within the piece; the tension between novelty and tradition within the musical genre we are hearing; the cultural particularity and interconnectivity of the style of music we’re hearing; and the special form of music in the human species, an art form emerging from and living in relationship with the diversity of music in other species.


From Sounds Wild and Broken by David George Haskell, published by Viking, an imprint of Penguin Publishing Group, a division of Penguin Random House, LLC. Copyright © 2022 by David George Haskell.


More Great WIRED Stories