Get your print-friendly copy HERE.



Genetic Recombination in Eukaryotes

  • Recall the nature of meiosis, and how chromosomes assort independently.
  • Also recall that there are many genes on each chromosome.
  • The genes on any given chromosome are likely to be inherited together...
  • Unless they are separated during crossing over.

    Mendel's Law of Independent Assortment: The various factors controlling each physical trait assort independently into the newly formed sex cells.

    Well, sort of.

    If they are located on the same chromosome, the genes are said to be LINKED.


    Also recall that a DIHYBRID CROSS is a cross between two dihybrid individuals:

    A TRIHYBRID CROSS is a cross between two trihybrid individuals:


    ...and so on. You already have a table that allows you to determine the expected numbers of various things from multihybrid crosses, but here's the table, in case you've forgotten:

    Number of F1 gamete types

    2n

    Proportion of F2 homozygous recessives

    1/(2n)2

    Number of different F2 phenotypes (complete dominance)

    2n

    Number of different F2 genotypes (or phenotypes, if no dominance)

    3n

  • Also recall that a TEST CROSS is a mating between an individual expressing the fully dominant phenotype and an individual expressing the fully recessive phenotype for all characters under consideration.

    Note that it's still called a TEST CROSS, even if you know the genotype of the fully dominant-expressing individual. Because sometimes it's not the genotype you're testing. Sometimes it's the likelihood of crossing over between two genes on the same chromosome.

  • A SELF refers to the union of gametes from a single individual (the ultimate inbreeding!). We may do some of this, too. (Well, we won't, but...)
    In some cases, as we already have intimated, genes may be separated from one another during meiosis, even if they are on the same chromosome. This is known as RECOMBINATION.

    RECOMBINATION is any process occuring in a meiotic cell (meiocyte) that results in the haploid genotype of the daughter cell being different from the haploid genotype of either of the gametes that originally combined to produce the parent cell. Here's a MOVIE of alternative alignments.

    • A cell produced by a recombination event is called RECOMBINANT
    • A cells that has not undergone recombination (and thus has the same haploid genotype as the original gametes that combined to make the parent cell) is called PARENTAL

    Note that in the above cases, the recombinants have been produced without any crossing over.


    However, CROSSING OVER is another way to produce cells with a different haploid genotype as the original parent cell.

    Since crossing over can occur at multiple points anywhere between two homologs, the closer together two genes are on a given chromosome, the more closely linked they are (i.e., the fewer potential crossover points there are between them.


    With the mechanics of crossing over in mind, we can now move on to a re-enactment of one way to determine the distance between two genes on the same chromosome:

    LINKAGE AND MAPPING

    First, a few definitions:
      linked genes - genes located on the same chromosome
      linkage group - all gene loci on a single chromosome pair
      genetic map - representation of the location of gene loci on a particular chromosome

    A bit of history:

    Linkage first noticed in sweet peas by William Bateson, E.R. Saunders and Richard C. Punnett:
    Two traits, flower color (purple or red) and pollen shape (long or round) did not exhibit the expected pattern of 9:3:3:1 offspring phenotypes for a dihybrid cross, as predicted by Mendel's Second Law (Independent Assortment). But they couldn't explain why.
    BIG NOTE: One problem was that there were SOME offspring that showed up as purple/round and red/long...but very few! The two loci, it turns out, were not completely linked.


    T.H. Morgan, The Drosophila King, further demonstrated that certain traits were inherited together SIGNIFICANTLY more often than predicted by Independent Assortment. His first trials involved eye color (red or white) and wing size (regular or miniature).

    It wasn't until 1931 that Harriet Creighton and Barbara McClintock (The Queen of Corn) made the connection between crossing over and true exchange of gene loci between homologs.

    They observed obvious physical exchanges in corn because a translocation of a piece of chromosome 8 had become stuck to chromosome 9 (See Box 5-1 in your text for a full account of this phenomenon).
    Side notes:
    (An obvious, physical structure that can be used to locate and identify genetic exchange events is called a CYTOLOGICAL MARKER).
    (Genes which allow easy identification of loci on a chromosome (due to the appearance of obvious phenotypic traits in the individuals in which they occur) are known as GENE MARKERS).


    Similar results were reported for Drosophila by Curt Stern: the phenomenon of genetic exchange during crossing over was NOT unique to corn.
    At this point, it was still not known whether the exchange took place before or after the chromosomes had replicated during S phase of meiosis.
    • Did crossing over take place between two chromosomes (homologs)?
    • ...or between four chromatids (two sets of sister chromatids, each corresponding to a homolog, but able to cross over independently of the other sister)?

    The answer came from Neurospora crassa, the pink (orange?) bread mold: CROSSING OVER TAKES PLACE DURING THE TETRAD PHASE.


    GENETIC MAPPING is the process by which relative positions of genes on chromosomes are determined.
    Initially, this is done via LINKAGE ANALYSIS, which is what we'll look at today. More refined mapping is done with molecular techniques--especially with humans and other organisms that don't take well to experimental matings. For our demonstrations, we'll use our old, beloved Drosophila.


    Before we embark on the matings, let's get a few things straight:

  • In controlled matings of this type, "P generation" is generally used to designate the TRUE BREEDING parents (one fully dominant, one fully recessive) mated to obtain an F1 of DIHYBRID offspring.
  • If genes are not linked, then when the F1 are mated to one another (or selfed), the F2 is expected to exhibit the 9:3:3:1 ratio of phenotypes predicted by Mendel's Law of Independent Assortment.


    Let's have a look at how one might try to determine the "distance" between two gene loci on a single chromosome. (This is based on the work of T. H. Morgan.)

    The System:

    • Wild type Drosophila melanogaster have an unstriped thorax and a particular wing vein pattern.

    • Drosophila melanogaster homozygous recessive for a mutation at the bn gene locus have a striped thorax:

    • Drosophila melanogaster homozygous recessive for a mutation at the det gene locus have an unusual wing vein pattern:

    • In our Drosophila cultures, we have true-breeding strains for both loci:

      • Wild type thorax, mutant veins: bn+bn+ detdet

      • Mutant thorax; wild type veins: bnbn det+det+

      (NOTE: WILD TYPE ALLELE IS OFTEN WRITTEN SIMPLY AS "+")

    • possible gametes from P1:
      • bn+bn+ detdet will yield only bn+det gametes

      • bnbn det+det+ will yield only bndet+ gametes

    Hence, the F1 will be 100% bn+bn det+det

  • Expected phenotypes in the F2: 9 ++ : 3 +det : 3 bn+ : 1 bndet

  • But a TEST CROSS is more revealing...

    • bn+bn det+det x bnbndetdet

    • expected: 1:1:1:1 of all four possible phenotypes

    • HOWEVER, the actual results of this cross are as follows...
      3 wild type: 512 +det: 483 bn+ : 2 bndet

    • The most common phenotypes in the offspring group are PARENTALS, the same as those in the P Generation. We refer to these as parental, or nonrecombinantphenotypes.

    • The scarce, fully wild type and fully mutant phenotypes are called nonparental, or recombinant phenotypes. Their combination of thorax and vein morphology is DIFFERENT FROM THE P generation's.

    The fact that the ratio of offspring is so significantly different from the expected is evidence that these two loci (bn and det) are linked--located on the same chromosome.

    The ratio of parental to nonparental phenotypes is useful. As a general rule, the closer together two loci are on a given chromosome, the more likely it is that they will NOT be separated and reshuffled during crossing over.

    The % of offspring in each of the four potential phenotypic categories in the test cross can be used as an index of the likelihood of crossover between the two loci.

    In our example above, we found that 99.5% of the offspring exhibited parental phenotype, and 0.5% exhibited recombinant phenotype. This very low frequency of recombinants suggests that bn and det are very close together.

    BY CONVENTION: Every 1% of recombinant offspring in a given cohort is said to represent one MAP UNIT of "distance" between the two loci in question.

    (one map unit is also known as a centimorgan, in honor of T.H. Morgan.)

    (NOTE: map distance is not a discrete distance: it is a relative "distance" expressing the likelihood of crossover between two loci.)



    We have demonstrated that the loci are linked, but we still have no idea HOW CLOSE they are together.
    To determine this, we need to stick a third locus between the two of interest, and analyze a THREE POINT CROSS.

    LET'S USE SOME NEW PHENOTYPES, JUST FOR FUN:

    • wild type fly = brown body (b+)


    • mutant = black body (b)



    • wild type fly = red eyes (pr+)
    • mutant = purple eyes (pr)



    • wild type = straight wings (c+)
    • mutant = curved wings (c)

    At the outset, we don't know whether these loci are on the same or different c'somes.


    A brief digression:

    There's special shorthand to designate linkage or non-linkage of loci.

    • If we don't know where the loci are, relative to each other, we write the genotype like so:

      bb prpr cc

    • If all loci are known to be on a separate c'somes, the genotype is written:

      b/b pr/pr c/c

    • If all three loci are on the same c'some, the genotype is written:

      b pr c/ b pr c

    FOR EXAMPLE: b/b pr+ c/pr c

    ... means that

      1. pr and c are linked

      2. b is not linked to pr and c

      3. the fly is black bodied, red-eyed, and has curved wings.


    If the three loci are completely unlinked, a Punnett square of the test cross yields an expected phenotypic ratio of 1/8 of every possible phenotype:
      1. completely wild type, all three traits ("wild type")
      2. wild type body & eyes; mutant wing (we'll call this "curved")
      3. wild type body; mutant eyes & wings ("purple, curved")
      4. wild type eyes & wings; mutant body ("black")
      5. wild type wings; mutant body & eyes ("black, purple")
      6. wild type body & wings; mutant eyes ("purple")
      7. wild type eyes; mutant body & wings ("black, curved")
      8. completely mutant body, eyes & wings ("black, purple, curved")

    LET'S SAY YOU ACTUALLY DID THIS! YOU BRED A KNOWN HETEROZYGOTE TO A FULLY MUTANT FLY,AND LET THEM REALLY GO AT IT (okay, you probably used a bunch of flies...) TO PRODUCE 15,000 ADORABLE LITTLE MAGGOTS! Once the larvae pupated and emerged as adults, you found the following numbers of each phenotypic class...

     

    wild type

    5701

    curved

    1412

    purple, curved

    388

    black

    367

    black, purple

    1383

    purple

    60

    black, curved

    72

    black, purple, curved

    5617

    TOTAL

    15,000

    This doesn't look like 1/8 (1875) of each phenotypic class, does it?

    And if you were to perform a statistical test, you would find that the numbers above do indeed differ significantly from the expected!


  • THE LOCI ARE LINKED--BUT NOT COMPLETELY (i.e., some crossing over has occurred between them)
  • If the loci were completely linked (on the same c'some), we would expect to get 50% of each parental type in the offspring, and no recombinants at all.


    Next step: group offspring by RECIPROCAL CLASS:

      1. wild type vs. fully mutant (parentals/nonrecombinants)
      2. black vs. purple, curved (recombinant - crossover between b & pr)
      3. black, purple vs. curved (recombinant - crossover between pr & c)
      4. purple vs. black, curved (recombinant - double crossover! (b & pr + pr & c)
    And figure out the number of offspring in each RECIPROCAL CLASS. (For now, let's assume that the order on the chromosome is b pr c.)

     

    wild type + black, purple, curved

    5701 + 5617 = 11,318
    (no crossing over among our loci)

    black + purple,curved

    367 + 388 = 755
    (crossover between b & pr)

    black, purple + curved

    1383 + 1412 = 2795
    (crossover between pr & c)

    purple + black, curved

    60 + 72 = 132
    (double crossover:
    one between b & pr, and another between pr & c)

    Next:

  • Count the total number of crossovers in each class (i.e., between each pair of loci)
  • Divide by the total number of offspring (15,000 in our case)
  • This gives you the proportion of offspring in each crossover class.
  • Each % of offspring corresponds to one map unit between the two loci in that class.
  • Like so...

      1. b & pr: 388 + 367 + 72 + 60 = 887
        887/15000 = 5.9%
        (5.9 map units between b and pr)

      2. pr & c: 1383 + 1412 + 72 + 60 = 2927

        2927/15000 = 19.5%
        (19.5 map units between pr and c)

      3. b & c: 388 + 367 + 1412 + 1383 = 3550

        3550/15000 = 23.7%
        (23.7 map units between b and c)



  • The above data are called measured map distances, and each is derived from a two-point crossover.

  • More accurate are actual map distances. These are the summed values of smaller intervening loci between two loci being measured for distance.

  • The actual map distance between b and c is the sum of the measured map distances between
    b & pr and pr & c:

    5.9 + 19.5 = 25.4

    Hey! This is different from the measured map distance between b and c, which we have already calculated as 23.7 map units. What's going on?

    Unlike the measured map distance, the actual map distance calculation takes into account all crossover events, including DOUBLE CROSSOVERS. The measured map distance calculation for b & c missed the double crossovers!


    Over the years of doing these calculations, investigators noticed a pattern emerging, which can be graphed as the MAP FUNCTION.


    How can you tell the order of the loci on the chromosome? The easiest way is to look at which reciprocal class has the smallest proportion of offspring. That's the class of offspring that inherited chromsomes with a double crossover (a rarer event than a single crossover between the loci you happen to be looking at).

    In our case, the smallest class was the purple vs. black/curved. This means that the two loci ending up on the same chromosome (black and curved) are the OUTSIDE LOCI, and the one all by itself (purple) is the INSIDE MARKER.


    Next Question: Were the various crossover events independent of one another?

    ...Meaning, if a crossover happens in one of our two regions, does it affect the likelihood of a crossover happening in the other region?

    You can determine this by asking: What is the EXPECTED number of double crossovers judging from the single crossovers that took place between b & pr and pr & c.

    • If one assumes that crossovers between the two regions are independent,
      • There was a 5.9% crossover frequency between b & pr (P = .059) and
      • There was a 19.5% crossover frequency between pr & c (P = 0.195)

      To calculate the Probability of a DOUBLE CROSSOVER, multiply (remember the Product Rule!) the Probabilities of the two independent events:

      0.059 x 0.195 = 0.011 (1.15%)

      Therefore, the expected number of double crossovers in a cohort of 15,000 flies should be:

      0.011 x 15000 = 172.5 flies

      • Expected = 172.5
      • Observed = 132 (60 + 72)
      • ...which means that we saw fewer crossovers than expected from our Product Rule projection.


    The COEFFICIENT OF COINCIDENCE (c.c.) is an expression of the likelihood and nature of one crossover's interference with another.

    c.c. = Observed/Expected = 132/172.5 = 0.77

    ...Which means that 77% of the expected crossovers actually occurred.

    • If c.c. < 1.0, then one crossover exerts positive interference on the other. This means that the occurrence of the first crossover reduces the likelihood of the second crossover.

    • If c.c. > 1.0, then one crossover exerts negative interference on the other. This means that the occurrence of the first crossover increases the likelihood of the second crossover.

    The DEGREE OF INTERFERENCE (i) is expressed as

    1 - c.c.

    • If c.c. is greater than one, i will be negative, meaning that interference is in the "negative" direction (no interference or promotion of crossing over)

    • If c.c. is less than one, i will be positive, meaning that interference is in the "positive" direction (there is interference, and one crossover lowers the probability of the other).

    SOMATIC (MITOTIC) CROSSING OVER

    Ever seen a human iris with part blue and part brown pigmentation? This can be due to SOMATIC (= MITOTIC) CROSSING OVER.

      During development, homologs in a rapidly mitosing blastomere come into close apposition, and accidentally switch segments.

      If the organism is heterozygous (for something like eye color),one resulting blastomere will now be BB, and the other will be bb.

      Part of the iris which develops from bb blastomere will be blue!

    Studies of mitotic crossing over are important. Some organisms (such as fungi and bacteria) which do not have regular sexual reproductive cycles actually may get genetic variation via mitotic crossing over.

    Also, somatic crossing over may be one way in which deleterious alleles (such as faulty tumor suppressors) may be expressed in the body, even if the organism's overall genotype is heterozygous.


    HOW DOES CROSSING OVER TAKE PLACE?

    The exact mechanism is not known, but several models have been proposed.

    All models agree that one step in the process involves the formation of HETERODUPLEX DNA. This is DNA composed of one strand from one homolog, and the other strand from the other homolog.

    Genetic recombination can be regarded as a process of breakage and repair between two DNA molecules. The result of crossing over is two gametes containing strands of DNA that are like the original, parental strands and two gametes that contain recombinant DNA that has undergone crossing over.


    STATISTICS IN GENETICS

    One of many testable questions one can ask in Genetics relates to all we've just been studying: Are these genes linked? Are they inherited together more often than would be predicted by chance alone?

    To answer this type of question and others in genetics, we use common laws of Probability and Statistics.

    CALCULATING PROBABILITIES IN GENETICS

    ASKING QUESTIONS IN SCIENCE

    A famous naturalist once said, "Without a hypothesis, a geologist might as well go into a gravel pit and count the stones."

    What does that mean? It means that without some idea of a question you're asking, data are meaningless and useless. And before you set out to collect data, you must devise a QUESTION and provide PREDICTIONS about what the answers might be.

    A HYPOTHESIS is a "tentative proposition which is subject to verification through subsequent investigation....In many cases hypothese are hunches that the researcher has about the existence of relationships between variables." (Verma and Beard, 1981)

    To provide the most precise set of conditions, a hypothesis should be stated in terms of NO DIFFERENCE. That is, as a NULL HYPOTHESIS.

    For example, if you wish to test whether the progeny of a particular pair of gerbils exhibiting agouti and black fur color will exhibit phenotypes which reflect Mendel's Laws, but you suspect that there is a lethal involved with say, the black mutant, you will STILL state your NULL HYPOTHESIS AS:

    "The gerbil pups produced by Al (heterozygous agouti) and Betty (black) should exhibit phenotypic ratios that DO NOT DIFFER from the 3:1 agouti: black predicted by Mendel's Law of Segregation."

    Your hypothesis of interest, of course, is the ALTERNATE HYPOTHESIS, which states exactly the opposite of the null:

    "The gerbil pups produced by Al (heterozygous agouti) and Betty (black) should exhibit phenotypic ratios that WILL DIFFER from the 3:1 agouti: black predicted by Mendel's Law of Segregation."

    (Note that the above alternate hypothesis is TWO TAILED, meaning that it does not imply a direction for the expected data. A ONE-TAILED alternate hypothesis might be stated something like,

    "The gerbil pups produced by Al (heterozygous agouti) and Betty (black) should have more agouti and fewer black pups than what is predicted by Mendel's Law of Segregation."

    OR

    "The gerbil pups produced by Al (heterozygous agouti) and Betty (black) should have more black and fewer agouti pups than what is predicted by Mendel's Law of Segregation."

    Why state in terms of the null? Because there are far too many ways for the alternate hypothesis to appear to be true, and only one way for the null hypothesis to be true. If your data show that there is a difference between the observed and the expected, then the null hypothesis can be UNEQUIVOCABLY REJECTED. (Given adequate sample size and good experimental design!)


    PROBABILITY CALCULATIONS (such as the Sum Rule, Product Rule and Binomeal Theorem to be covered below) allow us to define the range of possible results, often in the form of a bell-shaped curve representing the likelihood of each result occuring. In other words, these calculations allow us to determine EXPECTED RESULTS.

    STATISTICAL TESTS (such as Chi Square, Student's t-test and ANOVA, one of which we'll cover today) define confidence limits, which tell us whether or not an observed result is significantly different from what is expected.

    But what constitutes "different from expected?" For that, we must apply statistical tests which have pre-determined (traditional!) confidence limits. There are many different types of statistical tests, and the investigator must carefully select that one that is appropriate for the data collected!


    To be able to apply a statistical test with confidence, the investigator must...

    1. Design a good experiment

      a. appropriate controls
      b. sufficiently large sample size

    2. Understand general statistical values such as

      a. mean
      b. standard deviation
      c. variance

    3. Understand what is meant by Probability

    Probability (P) of an event occuring can be expressed as:

    a/n

    ...in which a = the # of occurrences of the event in question

    ...and n = the total number of times possible for the event to occur.

    Scenario: You are doing a mark-n-recapture of shore crabs in Southern California (Pachygrapsus crassipes). Wild type crabs have a blue-green carapace; occasionally you'll find an albino.

    Question: What is the P that the next crab you capture will be an albino?

    Over the centuries, you have collected bazillions of crabs. From your data, you can now say that

    n = the total number of crab captures you made

    a = total number of albino crabs you have caught over the centuries.

    If in your ramblings you have caught 10,000 crabs, and of these, 4 were albino, then the P that you will capture an albino crab is

    a/n, or 4/10,000 (.00004)

    (NOTE: The Punnett square is another way to calculate the probability of offspring phenotypes expected.)


    COMBINED PROBABILITIES

    When you're dealing with more than one "event", you must use special care to determine exactly how the two events are related, before you choose the calculation to determine their relative probabilities!

    FOR EXAMPLE:

      event #1: has c possible outcomes

      event #2: has d possible outcomes

      The # of possible outcomes of the 2 events together is equal to c x d (cd).

    Let's say event #1 is: "agouti or black fur" in a gerbil and
    event #2 is "solid or piebald pattern fur" in the same gerbils.

    each event has two possible outcomes, so the total number of possible outcomes = 2 x 2, or four.

    If you do your Punnett Square, you can convince yourself that this is true: There are four possible gerbil colors that could come out of these "combined events":

    • agouti solid
    • black solid
    • agouti piebald
    • black piebald

    This allows us to understand three important Rules in Statistical testing in Genetics...

    • the Sum Rule
    • The Product Rule
    • The Binomial Theorem

    Let's take them one at a time...

    THE SUM RULE
    This is used to calculate the probability of two events occurring if event #1 PRECLUDES event #2. It's an "either/or" situation, such as the roll of a die. Once one face comes up on a roll, no other face can come up on the same roll.

    (Or, to use a genetic example, a child must be either male or female.)

    Question: What is the probability, upon rolling a die, that the roll will yield either a 1 OR a 6?

    P = (a/n)1 + (a/n)6

    Either event has a probability of 1/6.

    P = (1/6)1 + (1/6)6 = 2/6, or 1/3.

    In prose: One in three rolls will yield either a "1" or a "6".

    You can do the same thing for any genetic events that PRECLUDE (i.e., prevent from happening) each other.

    THE PRODUCT RULE

    This is used to calculate the probability of two events occurring if event #1 and event #2 are INDEPENDENT. It's an "and" situation, such as the rolling a die twice, back to back.

    Each roll's results is independent of previous rolls and subsequent rolls. (Or, to use a genetic example, two siblings can be male/male, male/female or female/female)

    Question: What is the probability, a die twice, that the first roll will yield 1 and the second roll will yield a 6?

    P = (a/n)1 x (a/n)6

    In our example, either event has a probability of 1/6:

    P = (1/6)1 x (1/6)6 = 1/36.

    In prose: You'll have to roll the die 36 times to get a two roll sequence in which the first roll is a "1" and the second roll is a "6".

    You can do the same thing for any genetic events that are independent.

    THE BINOMIAL THEOREM

    In this case, two alternate events have independent probabilities.

    Let's say we have two alternate events, X and Y.

    The probability of X is p.

    The probability of Y is q.

    n = the number of trials in which either X or Y can occur.

    s = the number of times event X occurs in your trials

    t = the number of times event Y occurs in your trials

    (Note that, by definition, p + q = 1.0 and s + t = 1.0)

    If you are going to perform a series of trials in which either X or Y can occur, the probability that X will occur s times and Y will occur t times can be calculated by using an expansion of the binomial equation...

    (recall: ! is the symbol for "factorial": 10! = 1x2x3x4x5x6x7x8x9x10)

    Let's do an example of this with a colorful example of gerbils. You're breeding gerbils that have two alleles at a the Gene B locus. The dominant allele (B) codes for agouti fur, and the recessive allele (b) codes for black fur.

    Let the games begin....


    Let's say you want to know what the probability is that in a heterozygote cross of two gerbils, the first four pups will be three agouti and one black.

    P =

    n = # births (4)
    s = agouti (p = 3/4 = .75)

    t = black (q = 1/4 = .25)

    Therefore,
    P = [4!/3!1!](.75)3(.25)1 = 0.42

    If your observed results fall outside of 95% of the curve, statistical convention states that your results are significantly different from the expected results.

    This means that some factor other than random chance has caused the variation from the expected.

    • Type I error: rejection of a hypothesis that is actually true.

    • Type II error: fail to reject a hypothesis that is not true.

    The type of sample distribution to which you compare your data depends on the type of data you collect:

      1. attribute data - "either/or" data; e.g. presence or absence of a particular trait.

      2. discrete numerical data - correspond to biological observations which are counted as integers.

      (e.g., # of beetles/m2 in a forest habitat; # of smokers in a population of college students)

      3. continuous numerical data - correspond to biological observations which are measured along a continuum.

      (e.g. - brain size; snout-vent length; stature; blood volume etc.)

    mean, mode and median describe middle values of sample distribution.

    range, standard deviation and variance describe dispersion of data points around those middle values.

    These values are known as PARAMETERS when they are known for an entire population.

    When only a sample of the population is measured, the resulting estimate of the parameter is called a STATISTIC.

    Parametric test is used to test the significance of continuous numerical data relative to the expected.

    Non-parametric test is used to test qualitative, discrete (attribute) data relative to the expected.



    Exercise: do a Chi square test on the gerbil breeding in which you got 81 agouti and 19 black from a heterozygote cross (Aa x Aa)


    Once you have calculated the statistic (in our case, the Chi square), you must determine the degrees of freedom in your system. This is a count of the number of independent categories.

    In our example, n = 2. But because agouti and black are not independent (a pup must be one or the other), df is not equal to n.

    For the Chi square test, df = n-1.

    In our example, df = 2 - 1 = 1

    Using the df = 1 row of the table of critical values in your text (note that other tables have higher resolution), you can determine that 1.92 (our Chi square) lies between 0.455 (P = 0.5) and 2.2.706 (P=0.1).

    This is usually written as follows:

    0.5 > P > 0.1

    To be considered significantly different from the expected, our P value must lie outside of 95% of the curve, meaning that it must be 0.05 or lower. The value we calculated above is nowhere near the significance level necessary to reject the null hypothesis. We FAIL TO REJECT THE NULL HYPOTHESIS.

    In extremely rigorous experiments, a significance level of 0.01 is sometimes used.

    In clinical trials (e.g. development of drugs, in the early stages) a more lax siginficance level of 0.1 is acceptable at the beginning of testing (you don't want to throw out a promising new drug, after so much expense has gone into developing it!).

    Now that you have your statistic, you must calculate the degrees of freedom. (a measure of the number of independent categories in your system)

    For the Chi square test, degrees of freedom is calculated as df = n-1

    Once armed with

    1) your Chi square statistic

    2) your degrees of freedom

    ...you may now migrate to the Table of Critical Chi Square values and look up your significance level!

    If the probability level linked with your Chi square statistic is 0.05 or less, the variation you observed is SIGNIFICANTLY DIFFERENT from the expected.

    If your P < 0.05, reject Ho and fail to reject Ha

    If your P > 0.05, fail to reject Ho

    In some extremely rigorous tests, the significance level is set at 0.01.

    But in some clinical trials, the significance level is set at a more lenient 0.10.