The actual map distance between b and c is the sum of the measured map
distances between
b & pr and pr & c:
5.9 + 19.5 = 25.4
Hey! This is different from the measured map distance between b and c,
which we have already calculated as 23.7 map units. What's going on?
Unlike the measured map distance, the actual map distance calculation takes into account
all crossover events, including DOUBLE CROSSOVERS. The measured map distance calculation
for b & c missed the double crossovers!
Over the years of doing these calculations, investigators noticed a
pattern emerging, which can be graphed as the MAP
FUNCTION.
How can you tell the order of the loci on the chromosome? The easiest way
is to look at which reciprocal class has the smallest proportion of
offspring. That's the class of offspring that inherited chromsomes with a
double crossover (a rarer event than a single crossover between the loci
you happen to be looking at).
In our case, the smallest class was the purple vs. black/curved. This
means that the two loci ending up on the same chromosome (black and curved) are the OUTSIDE
LOCI, and the one all by itself (purple) is the INSIDE MARKER.
Next Question: Were the various crossover events independent of one
another?
...Meaning, if a crossover happens in one of our two regions, does it
affect the likelihood of a crossover happening in the other region?
You can determine this by asking: What is the EXPECTED number of double
crossovers judging from the single crossovers that took place between b &
pr and pr & c.
The COEFFICIENT OF COINCIDENCE (c.c.) is an expression of the likelihood and nature of
one crossover's interference with another.
c.c. = Observed/Expected = 132/172.5 = 0.77
...Which means that 77% of the expected crossovers actually occurred.
- If c.c. < 1.0, then one crossover exerts positive interference on the
other. This means that the occurrence of the first crossover reduces
the likelihood of the second crossover.
- If c.c. > 1.0, then one crossover exerts negative interference on the
other. This means that the occurrence of the first crossover increases
the likelihood of the second crossover.
The DEGREE OF INTERFERENCE (i) is expressed as
1 - c.c.
- If c.c. is greater than one, i will be negative, meaning that interference
is in the "negative" direction (no interference or promotion of crossing
over)
- If c.c. is less than one, i will be positive, meaning that interference is
in the "positive" direction (there is interference, and one crossover
lowers the probability of the other).
SOMATIC (MITOTIC) CROSSING OVER
Ever seen a human iris with part blue and part brown pigmentation? This can be
due to SOMATIC (= MITOTIC) CROSSING OVER.
During development, homologs in a rapidly mitosing blastomere come into close
apposition, and accidentally switch segments.
If the organism is heterozygous (for something like eye color),one resulting
blastomere will now be BB, and the other will be bb.
Part of the iris which develops from bb blastomere will be blue!
Studies of mitotic crossing over are important. Some organisms (such as
fungi and bacteria) which do not have regular sexual reproductive cycles
actually may get genetic variation via mitotic crossing over.
Also, somatic crossing over may be one way in which deleterious alleles
(such as faulty tumor suppressors) may be expressed in the body, even if
the organism's overall genotype is heterozygous.
HOW DOES CROSSING OVER TAKE PLACE?
The exact mechanism is not known, but several models have been proposed.
All models agree that one step in the process involves the formation of
HETERODUPLEX DNA. This is DNA composed of one strand from one
homolog, and the other strand from the other homolog.
Genetic recombination can be regarded as a process of breakage and repair
between two DNA molecules. The result of crossing over is two gametes
containing strands of
DNA that are like the original, parental strands and two gametes that
contain recombinant DNA that has undergone crossing over.
STATISTICS IN GENETICS
One of many testable questions one can ask in Genetics relates to all we've
just been studying: Are these genes linked? Are they inherited together
more often than would be predicted by chance alone?
To answer this type of question and others in genetics, we use common laws
of Probability and Statistics.
CALCULATING PROBABILITIES IN GENETICS
ASKING QUESTIONS IN SCIENCE
A famous naturalist once said, "Without a hypothesis, a geologist might as
well go into a gravel pit and count the stones."
What does that mean? It means that without some idea of a question you're
asking, data are meaningless and useless. And before you set out to
collect data, you must devise a QUESTION and provide PREDICTIONS about
what the answers might be.
A HYPOTHESIS is a "tentative proposition which is subject to verification
through subsequent investigation....In many cases hypothese are hunches
that the researcher has about the existence of relationships between
variables." (Verma and Beard, 1981)
To provide the most precise set of conditions, a hypothesis should be
stated in terms of NO DIFFERENCE. That is, as a NULL HYPOTHESIS.
For example, if you wish to test whether the progeny of a particular pair
of gerbils exhibiting agouti and black fur color will exhibit phenotypes
which reflect Mendel's Laws, but you suspect that there is a lethal
involved with say, the black mutant, you will STILL state your NULL
HYPOTHESIS AS:
"The gerbil pups produced by Al (heterozygous agouti) and Betty (black) should
exhibit phenotypic ratios that DO NOT DIFFER from the 3:1 agouti: black
predicted by Mendel's Law of Segregation."
Your hypothesis of interest, of course, is the ALTERNATE HYPOTHESIS, which
states exactly the opposite of the null:
"The gerbil pups produced by Al (heterozygous agouti) and Betty (black) should
exhibit phenotypic ratios that WILL DIFFER from the 3:1 agouti: black
predicted by Mendel's Law of Segregation."
(Note that the above alternate
hypothesis is TWO TAILED, meaning that it does not imply a direction for
the expected data. A ONE-TAILED alternate hypothesis might be stated
something like,
"The gerbil pups produced by Al (heterozygous agouti) and Betty (black) should
have more agouti and fewer black pups than what is
predicted by Mendel's Law of Segregation."
OR
"The gerbil pups produced by Al (heterozygous agouti) and Betty (black) should
have more black and fewer agouti pups than what is
predicted by Mendel's Law of Segregation."
Why state in terms of the null? Because there are far too many ways for
the alternate hypothesis to appear to be true, and only one way for the
null hypothesis to be true. If your data show that there is a
difference between the observed and the expected, then the null hypothesis
can be UNEQUIVOCABLY REJECTED. (Given adequate sample size and good
experimental design!)
PROBABILITY CALCULATIONS (such as the Sum Rule, Product Rule and Binomeal
Theorem to be covered below) allow us to define the range of possible
results, often in the form of a bell-shaped curve representing the likelihood of
each result occuring. In other words, these calculations allow us to
determine EXPECTED RESULTS.
STATISTICAL TESTS (such as Chi Square, Student's t-test and ANOVA, one of
which we'll cover today) define confidence limits, which tell us
whether or not an observed result is significantly different from what is
expected.
But what constitutes "different from expected?" For that, we must apply
statistical tests which have pre-determined (traditional!) confidence
limits. There are many different types of statistical tests, and the
investigator must carefully select that one that is appropriate for the
data collected!
To be able to apply a statistical test with confidence, the investigator
must...
1. Design a good experiment
a. appropriate controls
b. sufficiently large sample size
2. Understand general statistical values such as
a. mean
b. standard deviation
c. variance
3. Understand what is meant by Probability
Probability (P) of an event occuring can be expressed as:
a/n
...in which a = the # of occurrences of the event in question
...and n = the total number of times possible for
the event to occur.
Scenario: You are doing a mark-n-recapture of shore
crabs in Southern California (Pachygrapsus crassipes).
Wild type crabs have a blue-green carapace; occasionally you'll
find an albino.
Question: What is the P that the next crab you capture
will be an albino?
Over the centuries, you have collected bazillions
of crabs. From your data, you can now say that
n = the total number of crab captures you made
a = total number of albino crabs you have caught over the
centuries.
If in your ramblings you have caught 10,000 crabs,
and of these, 4 were albino, then the P that you will capture
an albino crab is
a/n, or 4/10,000 (.00004)
(NOTE: The Punnett square is another way to calculate
the probability of offspring phenotypes expected.)
COMBINED PROBABILITIES
When you're dealing with more than one "event", you must use special
care to determine exactly how the two events are related, before you choose
the calculation to determine their relative probabilities!
FOR EXAMPLE:
event #1: has c possible outcomes
event #2: has d possible outcomes
The # of possible outcomes of the 2 events together
is equal to c x d (cd).
Let's say event #1 is: "agouti or black fur" in a gerbil and
event #2 is "solid or piebald pattern fur" in the same gerbils.
each event has two possible outcomes, so the total number of possible
outcomes = 2 x 2, or four.
If you do your Punnett Square, you can
convince yourself that this is true: There are four possible gerbil
colors that could come out of these "combined events":
- agouti solid
- black solid
- agouti piebald
- black piebald
This allows us to understand three important Rules in Statistical testing
in Genetics...
- the Sum Rule
- The Product Rule
- The Binomial Theorem
Let's take them one at a time...
THE SUM RULE
This is used to calculate the probability of two events occurring if event
#1 PRECLUDES event #2. It's an "either/or" situation, such as the roll of
a die. Once one face comes up on a roll, no other face can come up on the
same roll.
(Or, to use a genetic example, a child must be either male or
female.)
Question: What is the probability, upon rolling a die, that the roll will
yield either a 1 OR a 6?
P = (a/n)1 + (a/n)6
Either event has a probability of 1/6.
P = (1/6)1 + (1/6)6 = 2/6, or 1/3.
In prose: One in three rolls will yield either a
"1" or a "6".
You can do the same thing for any genetic events
that PRECLUDE (i.e., prevent from happening) each other.
THE PRODUCT RULE
This is used to calculate the probability of two
events occurring if event #1 and event #2 are INDEPENDENT. It's
an "and" situation, such as the rolling a die twice,
back to back.
Each roll's results is independent of previous rolls
and subsequent rolls. (Or, to use a genetic example, two siblings
can be male/male, male/female or female/female)
Question: What is the
probability, a die twice, that the first roll will yield 1 and
the second roll will yield a 6?
P = (a/n)1 x (a/n)6
In our example, either event has a probability of
1/6:
P = (1/6)1 x (1/6)6 = 1/36.
In prose: You'll have to roll the die 36 times to
get a two roll sequence in which the first roll is a "1"
and the second roll is a "6".
You can do the same thing for any genetic events
that are independent.
THE BINOMIAL THEOREM
In this case, two alternate events have independent
probabilities.
Let's say we have two alternate events, X and Y.
The probability of X is p.
The probability of Y is q.
n = the number of trials in which either X or Y can
occur.
s = the number of times event X occurs in your trials
t = the number of times event Y occurs in your trials
(Note that, by definition, p + q = 1.0 and s + t
= 1.0)
If you are going to perform a series of trials in
which either X or Y can occur, the probability that X will occur
s times and Y will occur t times can be calculated by using an
expansion of the binomial equation...
(recall: ! is the symbol for "factorial":
10! = 1x2x3x4x5x6x7x8x9x10)
Let's do an example of this with a colorful example
of gerbils. You're breeding gerbils that have two alleles at a the Gene B
locus. The dominant allele (B) codes for agouti fur, and the recessive
allele (b) codes for black fur.
Let the games begin....
Let's say you want to know what the probability is that in a heterozygote cross
of two gerbils, the first four pups will be three agouti and one black.
P =
n = # births (4)
s = agouti (p = 3/4 = .75)
t = black (q = 1/4 = .25)
Therefore,
P = [4!/3!1!](.75)3(.25)1 = 0.42
If your observed results fall outside of 95% of the curve, statistical
convention states that your results are significantly different
from the expected results.
This means that some factor other than random chance has caused the variation
from the expected.
- Type I error: rejection of a hypothesis that is actually true.
- Type II error: fail to reject a hypothesis that is not true.
The type of sample distribution to which you compare your data depends on the
type of data you collect:
1. attribute data - "either/or" data; e.g. presence or absence of a particular
trait.
2. discrete numerical data - correspond to biological observations which are
counted as integers.
(e.g., # of beetles/m2 in a forest habitat; # of smokers in a
population of college students)
3. continuous numerical data - correspond to biological observations which are
measured along a continuum.
(e.g. - brain size; snout-vent length; stature; blood volume etc.)
mean, mode and median describe middle values of sample distribution.
range, standard deviation and variance describe dispersion of data points
around those middle values.
These values are known as PARAMETERS when they are known for an entire
population.
When only a sample of the population is measured, the resulting estimate
of the parameter is called a STATISTIC.
Parametric test is used to test the significance of continuous numerical
data relative to the expected.
Non-parametric test is used to test qualitative, discrete (attribute)
data relative to the expected.
Exercise: do a Chi square test on the gerbil breeding in which you got 81
agouti and 19 black from a heterozygote cross (Aa x Aa)
Once you have calculated the statistic (in our case,
the Chi square), you must determine the degrees of freedom in
your system. This is a count of the number of independent categories.
In our example, n = 2. But because agouti and black
are not independent (a pup must be one or the other), df is not
equal to n.
For the Chi square test, df = n-1.
In our example, df = 2 - 1 = 1
Using the df = 1 row of the table of critical values
in your text (note that other tables have higher resolution),
you can determine that 1.92 (our Chi square) lies
between 0.455 (P = 0.5) and 2.2.706 (P=0.1).
This is usually written as follows:
0.5 > P > 0.1
To be considered significantly different from the
expected, our P value must lie outside of 95% of the curve, meaning
that it must be 0.05 or lower. The value we calculated above
is nowhere near the significance level necessary to reject the
null hypothesis. We FAIL TO REJECT THE NULL HYPOTHESIS.
In extremely rigorous experiments, a significance
level of 0.01 is sometimes used.
In clinical trials (e.g. development of drugs, in
the early stages) a more lax siginficance level of 0.1 is acceptable
at the beginning of testing (you don't want to throw out a promising
new drug, after so much expense has gone into developing it!).
Now that you have your statistic, you must calculate the degrees of freedom.
(a measure of the number of independent categories in your system)
For the Chi square test, degrees of freedom is calculated as df = n-1
Once armed with
1) your Chi square statistic
2) your degrees of freedom
...you may now migrate to the Table of Critical Chi
Square values and look up
your significance level!
If the probability level linked with your Chi square statistic is 0.05 or less,
the variation you observed is SIGNIFICANTLY DIFFERENT from the expected.
If your P < 0.05, reject Ho and fail to reject
Ha
If your P > 0.05, fail to reject Ho
In some extremely rigorous tests, the significance level is set at 0.01.
But in some clinical trials, the significance level is set at a more lenient
0.10.