Introduction

Picture a human being before the dawn of language. They are returning to camp one afternoon. Walking along the beach, they stop to listen to the sounds of waves. Maybe they’ve never stopped like this before, or maybe it’s the tenth or the hundredth time, but they decided the waves make a pleasing sound—one they’d like to imitate.

And so, they try producing something like fwwwos, bwwwos, fwosbwos. None of these is quite right and they know it. They keep trying.

Now, jump ahead thousands of years to Homer’s Iliad—traditionally dated to the eighth century BCE—where the priest Chryses, harshly dismissed by Agamemnon, goes “in silence along the shore of the loud-roaring (polufloisboio) sea.” Polu- is a common Greek and English prefix (as in polyglot). But -floisboio is not so common, and much more remarkable: It strikes the ear like, well, a crashing wave—or a lackluster imitation of one.

This noun from ancient Greek, polufloisbos (here, in the nominative case), is onomatopoeic: a word that somehow imitates or suggests the sound it references. We tend to like such words. Think of sizzle, hiss, or cuckoo.

But they’re not just fun to say. Philosophers from Plato to Kant have wondered whether they offer a clue to the origins of language. The truth, however, may not be so simple.

Scholars across the globe have been engaged for centuries in a lively, and sometimes quite funny, debate over the origins of human language. Important work in this field continues today, drawing on disciplines from anthropology to biology, from psychology to linguistics, and from philosophy to literature.

The issue is complicated. Spoken language doesn’t leave behind many fossils, so it’s difficult to refute any one theory. A polymath-like knowledge of many disciplines is required for scholars to make headway, especially today.

Dorit Bar-On is one of these scholars, a professor of philosophy at the University of Connecticut and director of the Expression, Communication, and the Origins of Meaning research group. In 2020, she received an NEH fellowship to complete a book on the origins of language, drawing on studies of creatures other than humans that are minded as a foundation through which to investigate the subject.

Bar-On became interested in linguistics through earlier philosophical work on self-knowledge and what is called “theory of mind,” a problem that plagues scholars in these debates. We’ll return to the theory of mind after surveying the spirited history of the language-origins debate.

So how did our ancestors go from, say, an errant cry of pain or pleasure to the robust, organized system of language we know today?

Early Western Theories

In the eighteenth century, German philosopher Johann Gottfried von Herder had a radical idea, one that ran contrary to the popular notion that language was a divine gift.

Herder’s proposition was that vocal imitation—mimicry of the natural environment—could be the spark that, over time, led to fully developed language. Because nearby groups of humans share similar environments, the meaning of these imitations could be intuitively understood among them.

The idea was dismissed because most words are not onomatopoeic. But Herder was actually suggesting a first step, what scholars call a protolanguage—a foundation from which non-onomatopoeic words could later develop.

Another radical theory from the time, which Herder disputed in his Essay on the Origin of Language, suggested human language was derived from cries of pain: “I cannot conceal my astonishment at the fact that philosophers . . . can have arrived at the idea that the origins of human language [are] to be found in . . . emotional cries,” he wrote. “All animals, even fish, express their feelings by sounds; but not even the most highly developed animals have so much as the beginning of true human speech.”

A harsh dismissal, perhaps, but both the onomatopoeic and cry-of-pain theories shared a willingness to consider language as evolved rather than divinely received. Pugnacious rhetoric would be right at home among linguistics scholars, especially when Oxford’s Friedrich Max Müller entered the ring a century later.

A powerful and respected linguist, Müller dismissed both theories out of hand as the “bow-wow” and “pooh-pooh” theories of language origin, establishing a long tradition of name-calling in linguistics scholarship—sometimes, as here, to criticize opponents, and sometimes bestowed by scholars on their own theories to avoid an inevitable nickname of someone else’s choosing.

An expert in Proto-Indo-European—a theorized common ancestor of many languages still spoken today, as well as of Latin and ancient Greek—Müller believed in a single origin point for all modern languages.

“Language is the Rubicon which divides man from beast,” wrote Müller, “and no animal will ever cross it. . . . The science of language will yet enable us . . . to draw a hard and fast line between man and brute.”

At the deepest roots of the linguistic tree, Müller thought we would find one shared language, the defining characteristic of the human soul bestowed by God. Müller’s theory was widely critiqued even by his contemporaries, who, in an act of poetic justice, dubbed it the “ding-dong” theory.

Though Müller, in attacking the so-called “bow-wow” and “pooh-pooh” theories, took aim at what he believed was the influence of Charles Darwin, it was not until later—in 1871, to be exact—that Darwin offered his own account on the origin
of language in The Descent of Man.

Like Herder, Darwin envisioned a protolanguage, coinciding with an increase in intelligence—which modern science has confirmed through studies of hominid brain size. But Darwin’s protolanguage was, in contrast, musical and motivated by sex. In his vision, protolinguistic humans who could sing well were appealing to potential mates and frightening to rivals.

Alternative Protolanguages: Musical and Gestural

Today, most scholars accept the likelihood of a protolanguage and have adapted Darwin’s theory in interesting ways. We know from studies of modern languages that, historically, they can lose what is called tonality—where the pitch of musical notes plays a defining role in articulating meaning. And even those who listen casually to music know that songs can translate a depth of emotion—despair, joy, triumph—without words at all.

The Danish linguist Otto Jespersen, who developed a model for musical protolanguage, reflected on this in his 1922 book, Language: Its Nature, Development and Origin. “The mere joy in sonorous combinations . . . no doubt counts for very much,” he wrote.

Jespersen imagined human protolanguage as holistic. A group of musical notes or a short song might become tied to a particular act, like going on a hunt or shucking clams. Slowly, perhaps a specific song could evolve to signify not only being on the hunt but a desire to go hunting, as when food is in short supply.

Yet another theory advocates for gestural protolanguage as our intermediary step. Modern sign language—a fully developed communication system in ways simple gestures are not—provides some of the best evidence both for and against this theory. Sign shares all the same levels as spoken language, because it crafts nuance through movement and shape instead of tone and inflection. We know, then, that gestural communication is effective.

But if sign language shares virtually every advantage with spoken language, and the only difference is medium, why did we need to evolve spoken language at all? Maybe speech could have evolved to allow us to communicate under the cover of darkness. But gestural language has strengths as well, allowing people to talk about someone nearby without their hearing it. For every advantage of spoken language, there comes a disadvantage.

Perhaps emitting a sound was never an advantage at all and was selected for unintentionally.

British scientist Richard Paget conducted some amateur investigations into this subject, which led him to promote in his 1930 book a “mouth gesture” theory: “Originally man expressed his ideas by gesture, but as he gesticulated with his hands, his tongue, lips and jaw unconsciously followed suit in a ridiculous fashion, ‘understudying’ . . . the action of the hands.”

Paget’s idea was little believed even then. American psychologist E. L. Thorndike quipped in reply: “Personally, I do not believe that any human being before Sir Richard Paget ever made any considerable number of gestures with his mouth parts in sympathetic pantomime.”

Some scholars of the twentieth and twenty-first centuries have advocated, nonetheless, for an unconscious transition from gesture to speech, including an intermediary step where limited vocalizations—perhaps onomatopoeic or related to pain or pleasure—accompanied gesture.

The prevalence of both gesture and speech cautions against selecting one to the exclusion of the other. Studies have shown that significant effort is required to not gesture, even in circumstances where gesture is clearly unnecessary, like talking to a friend on the telephone. It seems possible that, while gestural communication served our progenitors well in most scenarios, there were still times when imitative speech was necessary.

One theory, proposed by American linguist Derek Bickerton, imagines an early human hunter coming across an animal far too large for him to kill alone. Returning to his camp, in desperation to signal that an enormous source of meat looms nearby, he mimics the beast’s cry. (Bees and ants are capable of doing something similar by producing pheromones.) In Darwinian terms, such a situation models environmental pressure, and it leads to one more question: Why communicate in the first place?

At its simplest, theory of mind is our ability to grasp that others have a mental state just as we do. It’s typically been seen as an either-or issue: You have full theory of mind, or you have none. Psychologists have created tests for measuring it in children, who, around the age of four, begin to demonstrate awareness of other minds. We need something like theory of mind to desire to speak in the first place, hence the problem it causes in origin-of-language debates.

What theory of mind in adults looks like is familiar enough, says Professor Bar-On.

“You go to a bar, and you have an empty glass in front of you, and you’re deliberately catching the eye of the barwoman, and you tap your empty glass,” Bar-On explains. “The barwoman is going to recognize that you’re drawing her attention to that because you want it filled, and because she recognizes that’s what you want, she’s going to do it. So there is a kind of mutual song-and-dance.

“Your communicative act essentially depends on your relying on what she will be able to infer about your intention. Okay, so we now have—depends on the analysis—three or four levels of intention. That’s called meta-representation; it’s a representation of somebody else’s representation.

“And that’s what you have to have before you can do anything like communicate using language. And here is my worry about this: Look at the structure of the thought that you have to have in order to engage in utterances with speaker meaning. It’s very much the structure of language. The thought is: I want her to recognize that I want her to understand what I’m doing. All this embedding, right?

“It raises the question: How could such thought arise before language? And aren’t we assuming that now we have a psychological Rubicon where before we had a language Rubicon? In order to cross the psychological Rubicon, you have to have this language-like thought. And then our puzzle is exactly the same: How could language-like thought arise where it didn’t exist before?”

Bar-On proposes an alternative way of looking at the problem, one that assumes theory of mind could evolve in parts. This approach draws on research that young children are still developing fuller theory of mind past age four, as well as studies of high-functioning persons with autism, who conventionally fail theory-of-mind tests but nevertheless have highly developed language.

If theory of mind can come in degrees and have various components—a theory that more psychologists espouse today—many of our problems are, if not solved, then certainly easier. We can imagine certain types of communication—both gestural and lexical, even musical—that don’t require such a sophisticated level of meta-representation. Mimicking a laugh to signal your own happiness does not require as deep a level of speaker meaning as signaling to get your beer refilled.

Perhaps, as human communication evolved, so too did mindedness; as one grew more complex, more capable of organizational structure, so did the other.

This both-and approach can guide how we think about lexical, musical, and gestural theories of language evolution. “Everybody at one point or another thought about the question, How did language come to be?” Bar-On says. “And so it’s not surprising that there have been all these different myths, all these just-so stories.”

After centuries of philosophical debate and scientific investigation, no one theory has succeeded in solving all the problems encompassing all the modes of communication needed to get through a day.

But Bar-On remains optimistic, including about past theories of bow-wowing and pooh-poohing our way to spoken language. “My hunch would be that they all have something to offer. . . . Each suggests one possible element in the toolkit of our ancestors.”

As we imagine the many distinct tasks performed by our progenitors on a daily basis—hunting large prey, instructing and caring for young, even just pausing to listen to waves—we can imagine equally many ways to gesture, sing, and mimic our way toward sharing those experiences with others. And though the fossils are hard to find, scholars like Bar-On continue charting new paths, weaving disciplines together, striking out on this mysterious Rubicon.

Originally published to the public domain by Humanities, the Magazine of the NEH 42:2 (Spring 2021).