Click HERE for your print copy.



Quantitative Genetics

The study of how interaction between genetic programming ("Nature") and environmental pressures ("Nurture") produces a range of phenotypes is known as quantitative genetics.

Key Ideas

  • In natural populations, phenotypic variation is usually quantitative (continuous), not qualitative (discrete)

  • It is next to impossible to apply Mendelian rules to the study of such phenomena. Instead, statistical methods are needed to analyze quantitative variation, which can be due to such factors as variable penetrance and/or expressivity, other factors such as epistasis and pleiotropy, as well as to environmental influence.

  • The task of the quantitative geneticist is to determine how genes/environment interact to produce a given trait distribution in a population. A trait produced by both environmental and genetic variables is known as a multifactorial trait.

  • Genetic variation in a continuous character may be due to

  • The estimated ratio of [genetic variation:environmental variation] is not equal to a measure of those things' relative contribution to a phenotype.

  • Important caveat: Such estimates made on a population are good only for the population under study. They cannot be applied across a broader range of populations.
    Traits controlled by multiple loci, each of which contributes equally to the phenotype exhibit One genotype may give rise to several different phenotypes (depending on expressivity, penetrance, etc.), and several different genotypes may produce exactly the same phenotype.

    Usually, continuous traits are affected by many loci (they're polygenic), and so environmental effects on these loci create an even broader array of phenotypes.



    Questions in Quantitative Genetics


    A Quick Review of Very Basic Statistics

    The Quantitative Geneticist uses statistics to examine the interaction of environment and phenotype, so let's not forget our basic statistical knowledge.

  • Monogenic traits usually are expressed as distinct phenotypic classes (qualitative or discrete numerical data) with no overlap. These are best expressed as a frequency distribution using discrete intervals (a histogram), as shown here:

  • Polygenic traits are usually expressed on a continuum (continuous data), for which a statistical distribution is best graphically expressed as a normal distribution of possible theoretical values of the measurements. This is a statistical distribution or frequency distribution.

    The Quantitative Geneticist is interested in determining how much variation in a phenotype (Vp) is due to genetics (Vg) and how much is due to environment (Ve).

    Vp = Ve + Vg

    ...but what are Ve and Vg? To be able to deal with this simple equation, one must be able to measure environmental and genetic contribution to phenotype.


    Recall the common parameters, mean, mode (measures of central tendency), variance, and standard deviation (measurements of scatter around the central value). Recall also that these values are known as parameters only when they are known from calculations based on measurements of every individual in the population of interest. Traditionally, parameters are represented with letters of the Greek alphabet

    When these parameters are estimated by measuring a subset of the population (a sample), they are known as statistics. Traditionally, statistics are represented with letters of the Roman alphabet corresponding to the Greek letter for the actual parameter.

    For example, the standard deviation for an entire population would be written as a Greek letter s (sigma), whereas the statistical, measured standard deviation would be written simply as "s".


    You remember:

    MEAN = S(x)/n

    in which x represents individual measurements taken
    and n represents the total number of measurements.

    Measures of how data points distribute around the mean include variance (s2) and standard deviation (the square root of variance).

  • Variance is calculated as

    In which:

  • x = individual measurements
  • x (bar on top) = mean of those measurements
  • n = sample size

  • Standard deviation is calculated as

  • Standard Error of the Mean (SE) is calculated as

    ...and is used if the investigator is analyzing multiple means generated by a series of repeated experiments, each of which generates a mean value.


    Variance and Standard Deviation allow precise specification of a normal distribution, as shown here:

    The lower the variance, the narrower the bell-shaped normal curve.


    A correlation is a relationship between two variables, usually with respect to one of the aforementioned parameters.

    For example:

    In the fossil mammal Phenocodus primaevis, the longer an individual's first molar is, the longer the second molar will be, although this relationship is imprecise.

    In the King Snake, Lampropeltis polyzona, tail length increases as body length increases. However, there is no correlation between tail length and number of caudal scales.

    In other words, correlation is a measure of the "precision" with which two variables change together, but does not imply a cause and effect relationship.

    The measure of two such variables' relationship to each other is expressed as a correlation coefficient, an index that can range from -1.0 to 1.0.

    To determine the correlation coefficient, we must first know the COVARIANCE of two variables. That is, how do the means of the two variables (we'll designate them as x and y) simultaneously deviate? This is calculated as

    In which

  • x and y are the individual values you are examining
  • x and y (bars on top) are the means of each of those values
  • n is your sample size

    For Example, the Quantitative Geneticist might wish to ask, "What is the relationship between DDT resistance in Anopheles (a mosquito) and the presence in Anopheles of DDT resistance alleles?"

    (Set up this way, it is the mosquito's DDT resistance that may vary with the number of DDT resistance alleles--not the other way around.)

    Here's a graphic look at some plotted correlations:

    As stated before, the Correlation Coefficient indicates the precision with which two variables are related. It does NOT indicate a cause-and-effect relationship, and it offers no predictive value.


    Regression Analysis

    If one wishes to predict the value of one variable (the response variable) by assigning the value of a related variable (the predictor), then regression analysis--not correlation--is used.

    The relationship between two variables is then expressed in the form of a regression line:

    A series of data points is graphed (in this example, it's father's height (x) versus son's height (y)) so that a best fit line can be computed, the slope of which represents the most accurate relationship between x and y.

    Remember how to calculate slope of a line:

    y = mx + b

    In which:

  • m is the slope
  • b is the y intercept

    SLOPE indicates how much change in y is expected due to a change in x.

    A regression equation contains estimates of one or more unknown regression parameters (constants), that quantitatively link the dependent and independent variables. The parameters are estimated from actual measurements (statistics) of the dependent and independent variables.

    Regression analysis is used for predictions and testing hypotheses about the suspected relationship between two variables. Uses of regression include prediction (including forecasting of time-series data), modeling of causal relationships, and testing scientific hypotheses about relationships between variables.

    EXAMPLE:

    Re-wording the correlation question about Anopheles and DDT resistance above, the researcher would ask, "Can the degree of DDT resistance in an Anopheles mosquito be predicted byt he number of DDT resistance alleles it carries?"
    The variables would be the same:
    Independent variable (x) = number of DDT resistance alleles
    Dependent variable (y) = degree of DDT resistance in the mosquito

    A good overview, including equations, can be viewed HERE.


    ANOVA

    The quantitative geneticist is often faced with data that require more detailed analysis than a simple Chi-square or t-test will provide. If s/he wishes to know whether there is a significant difference between multiple means, then the appropriate test to use is the Analysis Of VAriance, or ANOVA.

    For example, if one wanted to test whether there is a relationship between the size of individual Raphanus brassica plants and their proximity to the local nuclear power plant, ANOVA is the way to go.

    Multiple means could be analyzed in pairwise fashion via the t-test, but as the number of means grows, the possible number of pairings also grows, and so does the possible contribution of random chance when one separates all possible pairings. ANOVA combines all the means into a single group, with all data contributing to a single statistic (F) for which there is only one P value to assign for rejection of the null hypothesis.

    Check out this link for a swell visualization of ANOVA.


    Polygenic Inheritance and Environmental Influence on Phenotype

    This was first demonstrated in 1909 by W. Johannsen, who studied the relationship of seed weight of a parental population to seed weight of their offspring. He found:

    What did this tell us? That bean weight is controlled by several loci, but that environment also plays a role in final phenotype.


    Individuals with a given genotype may be expected to show a discrete phenotype, but in natural populations, the phenotype more often describes a frequency distribution due to other factors affecting the phenotype. For example, if a locus is segregating a dominant an d a recessive allele, the phenotypic distributions of the three possible genotypes might look something like this:

    and when one adds all these phenotypes together, the typical normal distribution of the trait in the population appears:

    For many years, continuous variation like this was assumed to be mainly due to the interaction of genes controlling the phenotype (multiple factor hypothesis). But it turns out that when the same plants above are raised in rigidly controlled environments, the normal curve gives way to a trimodal curve:

    Environmental influence on phenotype has been removed.

    But recall Johannsen's experiments, and know that even a few loci with varying effect can produce a distribution that is difficult or impossible to distinguish from the curve produced by many interacting loci, each with a very small effect on phenotype.


    Heritability

    Professional agriculturists know that they cannot subject their crops/herds to unlimited inbreeding. Too much homozygosity at multiple gene loci almost always results in deleterious alleles being expressed, reducing vigor and yield.

    Heritability is a measure of the degree to which the variance in phenotype distribution is due to genetic causes.

    How does one get the greatest degree of selection (for desired traits) with the lowest risk of inbreeding depression? Calculate a heritability estimate: a value that predicts to what extent an artificial selection effort will be successful.

    H = Yo - Ym/Yp - Ym

    In which...

    Also,

  • Heritability is the gain in yield divided by the amount of selection that has occurred.
  • If Yo = Ym, there has been no gain, and heritability of the trait in question is zero.

    (Since this can be calculated only after the breeding has occurred, it is often referred to as realized heritability.

    Quantitative geneticists consider realized heritability to be an estimate of TRUE HERITABILITY:

  • narrow sense
  • broad sense

    To understand the difference, we have to partition the variance:

    VPH = VG + VE

    In which VG can be further broken down into its components, in which:

    The original equation can thus be rewritten as:

    VPH = [VA + VD + VI] + VE


    Heritability in the broad sense is equal to

    HB = VG/VPH

    This is the heritability due to all genetic factors, including additive polygenes.

    Heritability in the narrow sense is equal to

    HN = VA/VPH

    Narrow sense heritability is of greatest interest to breeders, who wish to consider how to manipulate additive genes to obtain the greatest yield, and to geneticists, who wish to understand the genetic components of phenotypic expression.


    Measuring Heritability

    This must be done with great caution, as many factors can confound the investigator's ability to discern which components of phenotype are due to genetic factors.


    Norm of Reaction and Phenotypic Distribution

    A basic tenet of Quantitative Genetics is the Multiple Factor Hypothesis: Large numbers of genes, each having a small effect individually, segregate and recombine to produce continuous variation of a particular trait.


    The Norm of Reaction is a pattern of phenotypes produced by a given genotype, under a variety of environmental conditions.
    Yes, you've heard it before: phenotype is a product of both genotype and environment.

    If, for a given genotype, a series of known "micro-environments" can predictably result in a particular phenotype, then

  • a (a) distribution of environments will be reflected biologically as a
  • (b) distribution of phenotypes. The way in which (a) is transformed into (b) is expressed by a function known as the Norm of Reaction:

    In our example, Plant height (in cm) is correlated with environmental temperature (oC).

    The frequency of distribution of developmental environments is reflected as a frequency distribution of plant phenotypes, as determined by the norm of reaction.

    The shape of the norm of reaction curve reflects how the environmental condition distribution is distorted on the phenotype axis.

    In our example, norm of reaction falls rapidly at low temperatures, but flattens out at higher temperatures.

    In plain English, this means that plant phenotype varies greatly with small changes in temperature at low temperatures, as temperature increases, the plants' phenotypic response is less dramatic (at higher temperatures, a larger temperature change can occur without a concurrent large change in plant phenotype).

    This can get complicated quickly when one adds more than one genotype and more than one environmental factor:

    Let's return to the idea of familiality versus heritability. If environment affects phenotype, how do we know if a phenotypic trait is affected at all by genotype? Because developmental processes governed by genes lie at the base of every character.

    For example, the morphological structures that make Homo sapiens capable of speech depends on the development of brain, vocal cords, and mouth and tongue structure. These are under genetic control. However, variation in speech (languages) is almost entirely environmental.

    And Cow will never speak, except on Cartoon Network.


    If genes are involved in the development of a trait, then biological relatives should resemble each other in that trait more than non-relatives do--but ONLY if relatives are no more likely to share common environments than non relatives. (This is rarely the case.)

    A familial trait is one shared by members of a biological family, for whatever reason.

    A heritable trait is one shared by individuals because of shared genotype.


    It is relatively (har) simple to determine familiality vs. heritability in controlled populations, but very difficult in wild populations of any organism--including humans.

    Because human families so often share a similar environment, the distribution of genetic vs. environmental effect on phenotype is often uninterpretable.

    Studies of monozygotic and dizygotic human twins shed some light on the issue, but even these are not entirely without confounding factors.

    Many behaviorally expressed traits in Homo sapiens are politically charged.

    ...often exhibit familiality. But not only are most probably polygenic, they also could exhibit variable penetrance and expressivity due to environmental and other factors.

    Always remember that correlation is not regression: a relationship between two variables does not imply cause and effect. So far, no significant predictability has been shown for any of these traits.

    Norm of Reaction studies can be of use here. However, they show only small differences among naturally occurring genotypes, and those differences are not consistent over a wide range of environments. This means that "superior" genotypes--at least in agricultural species--are "superior" only under certain environmental conditions.


    The Take Home Message: If human behaviors are, to some degree, under genetic influence, variation in those behaviors is unlikely to favor one genotype over another, given a range of environments. This means that even traits considered "undesirable" in one context may be adaptive in another. And this could help explain why such traits still exist in human (and possibly other) populations, despite their (possibly temporary) "undesirabity".

    In a social context, "undesirable" is determined by the societal mores of the time, and these may evolve. Social acceptability of a trait may have little to do with whether such a trait is adaptive/maladaptive/neutral in other contexts. (Can you think of examples?)

    Thus, the term "superior" applied to a behavior is not only subjective, but also has little to do with which genetically-affected behaviors are adaptive, maladaptive, or neutral. Changing environmental context must be considered in order to make any sense of the evolution and maintenance of such complex characters.

    Go forth and share.