Exploring Genetic Causation in Biology

By John McLaughlin / 07.02.2015
PhD Candidate in Developmental Biology
Hunter College
City University of New York

In both popular culture and the technical literature in biology, the word “genetic” is ubiquitous. Despite its common usage and universal recognition, discussions centered around this concept usually leave its meaning taken for granted. We have the vague sense that it relates to DNA, genes, heredity, and inheritance, but what does it mean precisely to describe a process, trait, disease, or property as “genetic?” Since the completion of the Human Genome Project, a latent hope seems to linger among the general public that genetic causes will be uncovered for the vast majority of human ailments. This hope has been fueled by the often confusing use of biological terminology in the popular media, especially when reporting (often in a sensational manner) on a new study or development in the biomedical sciences [1].

Playing such a central role in modern biology, I believe the concept deserves a thorough and nuanced discussion of its theoretical and methodological underpinnings [2]. My aim here is to briefly explore the idea of genetic causation from a biologist’s point of view, as well as the occurrence of misleading metaphors, such as genetic “determination” or “programming,” in the biology literature.

Let’s begin with some orienting information. In a broad sense, heredity is the study of how traits are passed from parents to offspring. Well before biologists understood the material basis of heredity to be DNA, several of its central concepts, such as genotype and phenotype, were solidly in place [3]. I will define these terms with broad strokes. The sum total of an organism’s observable traits is referred to as its phenotype [4]. In the laboratory, phenotype can refer to simple physical characteristics such as size, eye color, weight, and life span, but may also encompass more complex attributes such as behavior. An organism’s genotype is the composite total of all its gene variants [5], each of which is referred to as an “allele.”

The original Mendelian conception of the gene as an indivisible (and in its initial formulation, abstract) unit of heredity has been heavily revised and supplemented by the molecular biology revolution of the last half of the 20th century. The working definition of the gene has grown fuzzier over the decades, especially in light of new empirical findings such as the existence of a huge amount of non-coding and regulatory DNA (i.e., DNA that does not encode RNA and/or protein, but controls the expression of nearby genes). The “Central Dogma” of molecular biology [6] — a gene encodes a messenger RNA, which encodes a polypeptide — is now known to be a vast oversimplification that misses many important details. In addition, we are now aware of how enormously complex the process of development is, even for relatively simple animals. Thus, the classical atomistic notion of a gene, which directly “encodes” a phenotypic trait in a straightforward manner, has certainly been demolished. However, Mendelian concepts such as allele, dominance, and recessivity are still central pillars of developmental genetics, and are deployed in conjunction with biologists’ modern molecular knowledge of gene function.

This brings us to the concept of heritability. What is it and how is it related to genetics and heredity? In contemporary ecology and evolutionary biology, “heritability” is formally defined as the proportion of variance in a given phenotypic trait (e.g. height or tail length) that is attributable to genotypic variance [7]. An important fact to note about this definition is that it describes a population level phenomenon. Therefore, heritability is a property of a trait in a specific population, in a specific environment. For this reason, heritability equations [8] are not equipped to infer genetic causation that may be operating in individual organisms, let alone say anything about specific genetic mechanisms at play during development. The concept of heritability is often employed in genome-wide association studies, which attempt to link phenotypic variation to variation in genotypes on a population level [9].

Now, to approach the central questions. In my biased position as a developmental biologist in training, I understand genetic causation as mainly a question of how genes, other cellular materials, and the organism’s environment dynamically interact as co-causes of development. One guiding interest of developmental biologists is to determine how particular genotypes reliably produce a specific phenotype. I may have already erred by saying that a genotype “produces” a phenotype. Does this imply that genotype X will always produce phenotype X, regardless of the organism’s environment? And/or does it imply that genotype X is sufficient to produce phenotype X?

This question involves distinct conceptual and empirical issues: namely, what does it mean for a trait to be genetically caused, and assuming that this is a coherent concept, how can one experimentally demonstrate genetic causation? Naturally for biologists, the empirical problems occupy most of their attention.

Model organism research is ideal for uncovering genetic causation, because both organismal genotype and environment can be (fairly) easily manipulated. This manipulability is key because it allows scientists to experiment with counterfactual scenarios [10], which are considered by biologists to be vital for establishing causal relationships. For example, if two fruit flies, one “normal” (also known as wild-type) and one bearing mutation X [11], are raised in identical environments, a biologist might conclude that any phenotypic differences between the two are caused (directly or indirectly) by the presence of mutation X.

Can the conclusion be made this readily? A more nuanced and interesting set of experiments might place each of these flies in a range of ecologically/developmentally relevant environments. Flies in nature must tolerate a wide range of environmental conditions during their life cycle, in contrast to the optimized, static conditions usually employed in research with model systems. Part of the challenge should be in understanding how these conditions dynamically interact with the genotype and other developmental factors in producing the organism. For instance, if mutation X results in its associated mutant phenotype within a range of environmental conditions relevant to the fruit fly, it can more plausibly be judged to be a cause of the phenotype. Because development is the result of a multitude of interacting co-causal processes, the concept of genetic causation is more nuanced than it might appear at first glance.

Although simpler animals, such as flies, are interesting in their own right (at least I think so), a large amount of public and scientific attention is focused on the much more complex question of how our human genetic endowments affect both the development of and variation among individuals. Several human disorders, such as Huntington’s disease, sickle cell anemia, and cystic fibrosis [12], have a well-understood genetic basis, tied to specific mutations in one or a few genes. This knowledge resulted from years of difficult experimental work in studying the causally relevant genes, the functions of their gene products within the cell, their interactions with other cellular components during development, and their familial patterns of inheritance. With respect to genetic causation, these are obvious examples which fall on the extreme end (i.e., strong determinism) of the spectrum.

The problems, as I see them, begin to arise when the concept of genetic causation is applied well beyond the limits of our experimental understanding, and employed as an explanatory thesis in the behavioral or social sciences. An excellent work on this subject is “The Ontogeny of Information” [13] authored by Susan Oyama, a philosopher of science, psychologist, and founder of Developmental Systems Theory. This book was meant as a response to both the excesses of the sociobiology movement of the 1970s, and the reliance on design, blueprint, and programming metaphors in developmental biology and the behavioral sciences. Oyama describes these metaphors as part of the “cognitive-causal” model of the gene, a framework in which causal agency is assigned to the genes during development.

Although these arguments were raised several decades ago, genetic programming and blueprint metaphors are still commonly used by biologists today, both in their capacity as scientists and in communication with the public. Why are these descriptions of the gene so problematic? Within this conceptual framework, genes are implied to be autonomous information carriers, serving as blueprints from which all other cellular functions are derived. In reality, the cell consists of many components which are not “determined” by the genome in a strict sense. One example is cellular organelles, which are vitally important for cell function but are directly inherited by parental cells rather than encoded in DNA. Another biological fact countering these metaphorical descriptions is the exploding field of epigenetics [14], two main interests of which are the effects of environmental factors on the regulation of the genome, independently of DNA sequence, and various mechanisms of extragenetic inheritance (e.g., via RNA molecules or other cellular components besides DNA).

Why are genes uniquely viewed as carriers of information, when other components of the cell or organism often equally serve this function? Oyama makes an excellent point when she comments on how explanations in the behavioral sciences have become the province of biology when genes are thought to govern the outcome: “We have, for whatever reasons, a peculiar relationship between the behavioral and the biological sciences, a relationship in which some portions of the “higher levels” are considered really the province of the lower ones. Some behavior, feelings, or institutions are “genetically determined,” and therefore biological, while the rest are the proper material for the behavioral scientists. It is as though a chemist were to say that some compounds were really physical while others were (merely) chemical, or a physiologist, that some biochemical processes were chemical and others physiological.”

The conceptual problems surrounding genetics will become even more relevant in scientific and public life, as the biomedical sciences continue to emphasize “Big Data” such as human genome sequences, along with their health care implications. Although mostly neglected by academic biologists, I see these as philosophical issues ripe for conceptual clarification, as well as interesting insights into the historical development of the field itself.


[1] One recent example is this piece in The Telegraph, commenting on a study that examined genetic variants correlated with infidelity in females. The headline reads: “Cheating on your other half can be inherited.”

[2] The issues related to genetics, heredity, and reductionism are a broad topic in the philosophy of biology, but here are just a few books that I have found helpful and illuminating on the subject: Genetics and Reductionism. Sahotra Sarkar. Cambridge University Press, 1998. / Genetics and Philosophy: An Introduction. Paul Griffiths and Karola Stotz. Cambridge University Press, 2013. / The Structure of Biological Science. Alexander Rosenberg. Cambridge University Press, 1985.

[3] For example, the terms genotype and phenotype were both introduced in a 1911 paper by the Danish biologist Wilhelm Johannsen: The Genotype Conception of Heredity. The American Naturalist. 1911. Vol. 45, No. 531.

[4] Phenotype, wiki entry.

[5] Genotype, wiki entry.

[6] Central dogma of molecular biology, wiki entry.

[7] Heritability, wiki entry. See also: Heritability: a handy guide to what it means, what it doesn’t mean, and that giant meta-analysis of twin studies, by J. Kaplan Scientia Salon, 1 June 2015.

[8] A good discussion of heritability, including the differences between narrow and broad sense heritability and their domains of application, is found in Genetics and Reductionism. Sahotra Sarkar, Cambridge University Press, 1998.

[9] This paper gives a basic summary of the concept and its modern uses: Heritability in the genomics era — concepts and misconceptions. Visscher et al. 2008. Nature Reviews Genetics.

[10] By counterfactual scenario, I mean a statement of the form “If A had not occurred, C would not have occurred.” This quotation is from the SEP article on “Counterfactual Theories of Causation.”

[11] Let’s assume the presence of this mutation is the only genetic difference between these two flies.

[12] The genetic basis of each of these diseases is briefly summarized in their respective wiki articles: Huntington’sCystic fibrosis, and Sickle cell anemia.

[13] The Ontogeny of Information: Developmental Systems and Evolution, by S. Oyama, Cambridge University Press, 1985.

[14] Epigenetics, wiki entry. On the problems with the use of misleading or simplistic metaphors in biology, see: Why Machine-Information Metaphors are Bad for Science and Science Education, by M. Pigliucci and M. Boudry, Science and Education 20 (453):471, 2011.