Click HERE for your print-friendly copy of the notes. Don't print the big font pages!


Translation: Making Protein

The process of changing the language of the DNA and RNA code into the new language of protein is known as translation. Not all genes are translated, but this process is essential to the identity, morphogenesis, and operation of any living organism.

Understanding translation is also essential to understanding the genetics of medical problems, such as the evolution of antibiotic-resistant bacteria, errors in small RNA gene regulation (which can cause serious phenotypic disorders), enzymopathies, etc.

So here we go.


Division of Labor in the Genome

  • The proteome of an organism is its suite of genes devoted to protein manufacture (i.e., not including the genes that code solely for RNAs).
  • Various proportions of the proteome are believed to be involved in
    • metabolism
    • cell division
    • immune system defense
    • gene expression and regulation
    • physical structure
    • DNA replication, transcription and translation
      ... and many are still of unknown function.
    Today, we'll examine how prokaryotes and eukaryotes process the information on the mRNA transcript and--with the participation of functional RNAs and enzymes--manufacture protein. Translation is the process by which the information encoded on an mRNA is used to manufacture polypeptides: polymers of amino acids. Before we begin, an overview:

    Movie: Translation
    Movie: Peptide Bond


    The Structure and Function of Proteins

    Proteins are largely responsible for the shape, color, metabolism, physiology, and even behavior of every living organism. The nature of each individual's proteins is intimately involved in the individuality of that organism.

    Recall that protein, like nucleic acid, is a polymer--a chain of repeating subunits. In this case, the monomers comprising the polymer are amino acids (we'll abbreviate them as aa.

    Every amino acid has the same general structure:

    with the identity of "R" determining the physical and chemical properties of the particular amino acid.

    Amino acids are linked via peptide bonds to form the primary structure of the polypeptide (= protein).

    The primary structure, in turn, determines the secondary, tertiary, and to a great extent, the quaternary structure of the final protein.

    • globular proteins - compact, highly coiled. (e.g., enzymes, antibodies)
    • fibrous proteins - elongate (tend to be structural, as in hair, muscle, etc.)

  • Enzymes are both players and products in the process of translation.
  • Enzyme active sites are formed by tertiary and quaternary structure.
  • In the enzymes catalyzing translation, the R group of any given amino acid component is positioned in the active site so that it can facilitate its specific chemical reaction.
  • Although the processes leading to higher-order configuration are not fully understood, we do know that certain primary structure sequences tend to be associated with particular enzymatic functions.
  • For example, specific aa sequences are known to be related to
    • their localization in a plasma membrane
    • their ability to bind to DNA in a characteristic fashion
  • A conserved/consensus protein sequence (that usually result in a characteristic folding pattern) is known as a domain.


    Errors in Protein Folding
    In some rare instances, incorrect folding of specific proteins can have catastrophic consequences in the organism, as evidenced by prion diseases.

    Prions are small, proteinaceous agents that cause the misfolding of the gene product of the prnp gene. In all heritable forms of prion disease, the prnp gene is mutated. Strangely, however, prion diseases can be transmitted via the ingestion of prion particles that subsequently cause misfolding of the prnp gene product. The normal form of PrP protein (usually designated as PrPC) is found many different tissues in the plasma membranes. Its function is not known, but various studies have indicated that it may be involved in functions as diverse as antioxidation, establishment of long-term memory, regulation of Circadian Rythms, and even in guiding certain stem cell development pathways.

    Misfolded PrP protein (designated PrPSC and known as a prion) causes amyloid plaques to form in the central nervous system, and this rapidly leads to neurodegeneration and death.

    The prion mechanism is not yet fully understood, but it is believed that the prion form of the PRP protein directs misfolding of the normal form into new prions, which then cause various neurodegenerative problems.

    Prion diseases may be heritable, or infectious.

    Stanley Prusiner, who in 1982 reported the (initially controversial) idea that a proteinaceous particle could direct the formation of additional copies of itself, won the Nobel Prize (Physiology/Medicine) in 1997 for his work on prions.


    Genes and Protein

    The idea that one gene encodes the information for making one polypeptide was first put forth by Beadle and Tatum in the 1940s. This hypothesis has since been refined, but still holds true for both prokaryotic and eukaryotic proteomes.

    Charles Yanofsky (1963) demonstrated that a gene's nucleotide sequence was colinear with its protein product: mutations in the gene sequences caused a change in protein amino acid sequence.

    The work of these scientists paved the way for Francis Crick, who reported in 1966 that the sequence of nucleotides in a strand of mRNA corresponded to specific amino acids, work we now know as the Genetic Code.


    The Genetic Code


    The Genetic Code is a triplet code: each three bases on the mRNA (and, by extension, the three nucleotides on the DNA from which the mRNA was transcribed) represent one amino acid. Each three base sequence is known as a codon.

    The genetic code is

  • non-overlapping (each base is read only once)
  • non-punctuated (there's no "dead space" between informational codons)
  • given a three-letter code, there are 64 possible different codons: more than enough to provide for encoding the 20+ known amino acids in living systems.

    The Wobble Hypothesis - 1966, Francis Crick

    • The genetic code is degenerate: one amino acid may be encoded by several different codons.

    • unmixed codon families - first two bases always code for the same amino acid (there's practically no need to read the third base of the codon)

      e.g. - leu, val, ser, rpo, thr, ala etc.

    • mixed codon families - first two bases may be included in the code for more than one different amino acid or for an amino acid and a start or stop codon.

    • Crick termed this redundancy "wobble": the code was not rigid, and there was room for error.

    Similar codons code for amino acids with similar physical properties. For example:

    • a "U" in the center position always encodes a hydrophobic aa. A mutation at the two outer positions will not change that.

    • negatively charged aa's (e.g. aspartate, glutamate) always begin with GA. A mutation in the 3rd position will not change that.

    This is evidence that a triplet code not only allows for more diversity, but also provides a margin of error in terms of deleterious mutations.


    Functional RNA and Translation

  • Translation is effected (a verb, not a noun) by
    • numerous enzymes
    • mRNA
    • rRNA (in the ribosomes)
    • tRNA
    Both tRNA and rRNA are transcribed as mRNA is transcribed, but each of these undergo further processing into more complex forms than mRNA. They are never translated into protein, but play vital roles in the process of translation.

    Transfer RNA (tRNA): The Courier
    What is tRNA, and how is it different from mRNA?

  • Its regular structure facilitates attachment to the ribosome and easy transfer of amino acids to the growing polypeptide chain.

  • The complex folding structure is partially facilitated by modified nucleotide bases, whose affinities for each other differ from those of the typical bases.
    • modified purines: inosine (I) & methylinosine (MI) are modified adenine; methylguanosine (MG)

    • modified pyrimidines: ribothymidine (T); pseudouridine ()Greek letter psi; dihydrouridine (D)

  • Different tRNA's have these modified bases in different locations, usually incorporated into the loops (T, D or anticodon)
  • About 64 different types of tRNA are known
  • There are about 20 known amino acids in living systems

  • This means that there is redundancy: as Crick predicted--some tRNAs carry the same amino acid.
  • All tRNAs are highly congruent in molecular structure except at the
    • aminoacyl attachment point
    • anticodon loop
    These two points are what provide the variation in tRNA necessary for them to carry and deliver the different amino acids to the ribosome for translation.

    But before translation can begin, each amino acid to be joined to polypeptide chain must be activated.


    Amino Acid Activation

    Amino acid activation is the process by which amino acids are enzymatically loaded onto their proper tRNA couriers.

  • The activating enzymes, specific for each aa, are known as aminoacyl tRNA synthetases.

  • Each aminoacyl tRNA synthetase (there are about 20 known--one for each amino acid) is named for the aa it loads. For example:

    • The one that loads methionine is known as methionyl tRNA synthetase
    • The one that loads phenylalanine is known as phenylalanyl tRNA synthetase, etc.

    Activation is a two-step process

      1. aa + ATP ---> aminoacyl AMP + 2Pi

      2. aminoacyl AMP is attached to the 3' end of the tRNA.

    Synthetase proofreading reduces the error of tRNA-aa mismatch incidence to about 1 in 60-80,000.

  • Recall how mRNA and amino acids are related to one another via the codons.

  • Remember that codons are found on mRNA, and read from the 5' --> 3' direction.

  • The region of tRNA that is complementary to the mRNA codon is the anticodon, and it binds to mRNA in an antiparallel fashion, with its 5' base hydrogen bonded to the mRNA's 3' base.

  • The first two bases (from 5' to 3') of the codon are the constrained bases, and are vital for appropriate amino acid insertion.
  • However, the last base in the codon (at the 3' end) is not necessarily constrained.
  • The base pairing at the 3' end (last base of the codon) of the mRNA codon is said to be "loose"--meaning that there need not be a perfect match for an amino acid to be successfully added to the chain.
  • This "looseness" of the tRNA was first described by Francis Crick, who termed it the "wobble" ability of tRNA.
  • "Wobble" isn't completely without rules; only certain bases can pair with others:

    • 5' anticodon base G can pair with U or C at the 3' codon site
    • 5' anticodon base C can pair with G at the 3' codon site
    • 5' anticodon base A can pair with U at the 3' codon site
    • 5' anticodon base U can pair with A or G at the 3' codon site
    • 5' anticodon base I can pair with U, C or A at the 3' codon site
  • Thus, there may be several different tRNAs capable of pairing with a given codon.
  • Is "wobble" perhaps a remnant of punctuation between the codons of an ancient couplet code? Is the triplet code derived? We may never know.

    Interesting Side Notes

  • Some tRNAs have identical codons, but differ in their sequence in other locations (They are encoded by different tRNA genes).
  • These tRNAs carry the same amino acids, but are not identical to each other, and are called isoaccepting tRNAs.
  • There are 64 possible codons:
    • one is an initiator (methionine or n-formyl-methionine)
    • three terminators (no aa encoded)
  • Knowing this, one might expect E. coli to have 61 different tRNA's
  • In fact, only 50 different tRNAs are known.
  • During translational proofreading, it's the tRNA--not the aa-- that is proofread by the proofreading mechanisms of the cell.
  • It is also the tRNA--not the aa--that is recognized by the ribosome during translation. This makes it all the more critical that the tRNA attaches to the correct amino acid for a normal, functional protein to be produced.


    Ribosomes and rRNA

    Ribosomes are complexes of RNA and protein that are the "machinery" responsible for polypeptide synthesis.

    The ribosome is one of many biological machines in the cell. (A biological machine is a multi-subunit complex (e.g., Dicer and RISC in the RNAi system mediated by siRNA) responsible for a particular cellular function.)

    In all living organisms--prokaryotic or eukaryotic--the ribosome consists of a large and small subunit that are dissociated from each other unless actively translating protein.

    The ribosome contains three major active sites

    • aminoacyl (A) site - point of attachment of aa-loaded tRNA
    • peptidyl (P) site - point of attachment of tRNA holding polypeptide
    • exit (E) site - point where "unloaded" tRNA leaves the ribosome
    In addition, there are the
    • decoding center
        site in the 30S subunit that ensures proper codon/anticodon attachment. (tRNAs with matching the codon are called cognate tRNAs.
    • peptidyl transferase center
        Located in the 50S subunit, this active site is where peptide bonds are catalyzed.

      Let's visit the nucleolar organizer DNA.

    • This is the location of 5.8S, 28S and 18S RNA genes
    • The "S" stands for "Svedberg", a unit of relative sedimentation rate in a column of Cesium chloride.

    • After being assembled in the cytoplasm, ribosomal proteins migrate to the nucleolar organizer region of the DNA, and are assembled into ribosomes along with the RNA fragments encoded there.

    • The darkly-staining region of the nucleus known as the nucleolus (there may be several) is not an organelle, but a region of intense ribosome production.

    • Ribosomes are complex assemblages of proteins and rRNA, and they differ slightly across species (most significantly between prokaryotes and eukaryotes).

      Now that we've met all the players, we can continue to...

    The Phases of Translation

    As in DNA replication and RNA transcription, this process has been divided into three phases: initiation, elongation, termination.

    Initiation
    This is the critical first step. If tRNA attaches incorrectlyi, a frameshift error can occur, resulting in manufacture of a non-functional protein.

    Analogy: The sensical sentence

    THE FAT CAT ATE THE RAT with the first T omitted, would change to gibberish:

    HEF ATC ATA TET HER AT

  • 5'-- AUG-- 3': The AUG start codon is complementary to the anticodon of a tRNA carrying either N-formyl methionine (prokaryotes) or methionine (eukaryotes).

  • Fear not: the initial N-formyl met is snipped off during protein processing in the cell, after the process is complete: Not every protein in the cell starts with N-formyl met or met.

    The Initiation complex consists of

      a. small ribosomal subunit
      b. mRNA transcript
      c. N-formyl methionine-loaded tRNA
      d. initiation factor proteins, IF1, IF2 and IF3

    Positioning mRNA: The Shine-Dalgarno Sequences

    • In prokaryotes, mRNA is aligned at the ribosome by complementarity between mRNA upstream from the initiation codon (AUG) and the 3' end of the 16S rRNA of the small ribosomal subunit.
    • The specific sequence of the mRNA attachment point is AGGAGGU, known as the Shine-Dalgarno Sequence

  • The Shine-Dalgarno Sequence is not found in eukaryotes, but there are analogous systems.
  • In eukaryotes, the 5' end of the modified transcript (the 7-methyl guanosine cap) has an affinity for the small ribosomal subunit.

  • It is possible that the eukaryotic ribosome "scans" mRNA until it finds the AUG codon, and may locate it because the UAC tRNA is already attached to it.

    Initiation involves not only ribosomes, tRNA and mRNA, but also enzymes--known as initiation factors (IFs) that help position the various components of the initiation complex in their proper orientations. In E. coli, three IFs (IF1, IF2 and IF3) are known.

    At Initiation in E. coli

    • IF3 binds to the small (30S) subunit of the ribosome; only when the two are so bound can the 30S subunit attach to mRNA.
    • IF2 brings the initial tRNA to the ribosome; unless IF2 is bound to tRNA, the ribosome can't bind to it.
    • The complex formed is comprised of IF2-activated tRNA-GTP
    • The hydrolysis of GTP --> GDP + Pi does several things:
      • releases E to drive the reaction
      • changes conformation of the initiation complex so that the large (70S) ribosomal subunit can now bind to it.
      • IF1 and IF2 are released from the complex.
    • (In eukaryotes, 11 different IF's take part in this process!)

    • The tRNA attachment sites (P , A and E) are now formed at the junction of the large and small ribosomal subunits.
    • At the close of initiation, the N-formyl methionine-loaded tRNA is bound to the P site of the ribosome, and is ready for...

    Elongation
    The Goal: bind the next activated tRNA to the A site.

    The enzymatic players in this drama:

    • Elongation factor proteins: EF-T, EF-Tu and EF-G.
    • GTP, the energy source
    • Ribosome
    • Activated tRNA

    The sequence of events:

    1. activated tRNA is H-bonded to the appropriate codon on the mRNA attached to the ribosome. (The very first met-loaded tRNA is loaded into the P site of the ribosome at initiation; the A site is momentarily empty.)

    2. Meanwhile, elongation factor EF-T enzymatically links EF-Tu to a molecule of GTP.

    3. The job of the EF-Tu/GTP complex is to position the next activated tRNA into the "A" site.

    4. As this occurs, the GTP part of the EF-Tu/GTP complex is hydrolyzed to yield GDP and inorganic phosphate (Pi). Again, this results in two events:

    • release of energy to drive the reaction
    • change in the shape of the EF-Tu/GDP complex so that it loses affinity for the active site of the ribosome, and is released.

    The entire process takes long enough for proofreading of the codon-anticodon to take place. If the match is incorrect, the entire complex (EF-Tu/GTP/activated tRNA) is released, and the ribosome starts over with a fresh, incoming tRNA complex.

    If the tRNA is correct, the process continues...

    5. The peptidyl transferase site, located in the center of the large subunit of the ribosome links the amino acid of the A site tRNA to the growing polypeptide chain of the tRNA in the P site.

    6. Nearly simultaneously, translocase (composed of EF-G and GTP) attaches to the ribosome and moves it one codon downstream.

    7. Result: the tRNA previously in the A site is now

    • in the P site
    • linked to the growing peptide chain at its 3'end

    8. The tRNA previously in the P site (now naked) is tossed out the exit (E) site of the ribosome.

    9. Driving this action, the GTP on the EF-G/GTP complex is hydrolyzed, to yield EF-G/GDP and Pi. This has low affinity for the ribosome, and so is released to be disassembled and re-used later.

    Here's an overview of elongation. Elongation continues in this fashion until the ribosome reaches a nonsense (stop) codon, which triggers...

    Termination
    Among the codons on the mRNA are three "stop" codons...

  • UAG ("amber"--the translation from German of "Bernstein", the last name of one of the grad students on the Cal Tech team that did this work)

  • UAA ("ochre"--staying with the color theme)

  • UGA ("opal" -- how about a gemstone, this time?)

    These stop codons are recognized by release factor proteins:

    • RF1 recognizes UAA and UAG

    • RF2 recognizes UGA

    Either RF1 or RF2 binds to the stop codon, preventing attachment of tRNA.

    Result:

    • the "naked" tRNA booted out the exit site has handed off its polypeptide chain into an empty space (there's no tRNA there to accept it)

    • GTP-->GDP + Pi reaction again causes conformational change that results in

        - mRNA to release from ribosome
        - RF proteins to release from the mRNA
        - large and smal subunits of ribosomes to dissociate


    Biotechnology note: A putative gene is known as an open reading frame (ORF): it is suspected to be transcribed and translated because of the presence of a start codon at the 5' end, and one to several stop codons at the 3' end.

    Finding ORF's in a sequenced chromosome is the first step in discovering the genes themselves. The next step--determining the gene's actual function in an organism--is far more complex and daunting.


    A Brief Overview: Differences between Prokaryotic and Eukaryotic Transcription and Translation

    PROKARYOTESEUKARYOTES
    one type of RNA polymerase three different RNA polymerases
    translation can occur while transcription is still going on translation occurs in the cytoplasm, after transcription in the nucleus
    no known post-transcriptional modification of the RNA transcript sometimes extensive cotranscriptional modification of RNA transcript. Precursor mRNA (a.k.a. "heterogenous nuclear RNA, or hnRNA) is found only in the nucleus. There is only about 25% similarity between hnRNA and the finished mRNA of a eukaryote.
    Pribnow Box (5'-TATAAT-3')is the promoter promoter consensus sequences (the most common of which are the TATA ("Goldberg-Hogness Box"), the CAAT box and the GC box), are different from the Pribnow Box sequence in prokaryotes
    sigma factors act as recognition factors Upstream enhancer sequences appear to aid in promotion of gene expression.
    rho-dependent termination system Nothing similar to the rho-dependent system of E. coli is known in eukaryotes.


    Transcription and Translation: A few loose ends

  • How many genes are there per transcript?
    • in prokaryotes, they may be several, resulting in polycistronic mRNA known to be involved in operons.

    • in eukaryotes, each mRNA corresponds to only one gene

  • What happens to the nascent polypeptide chain?
    • prokaryotes further modify it with enzymes in situ (in the protoplasm)
    • eukaryotes transmit the new chain to the Golgi (dictyosome in plants) for further modification, assembly into more complex structures and packaging for export to other areas of the organism.
    • OR the polypeptide may immediately become part of a membrane

  • How fast does translation proceed?
    • prokaryotes - 15 peptide bonds per second

    • eukaryotes - 2 peptide bonds per second

  • How much energy is required to construct a protein?
    • 4 high energy phosphate bonds are hydrolyzed per peptide bond
      • (ATP --> AMP + 2Pi during aa-tRNA activation plus
      • 2 x GTP --> GDP + Pi during translation at ribosome)
      • This means 29.2 kcal per mole of peptide bonds. Expensive!

    5. When do transcription and translation take place, relative to one another?

    • In prokaryotes, the two processes can be simultaneous, with many ribosomes translating a single mRNA as it is transcribed from the DNA (the complex of multiple, working ribosomes is called a polysome)

    • In eukaryotes, the two processes are separated spatially and temporally.


    Post-Translational Processing of Polypeptides

    Polypeptide chains are not immediately functional upon exiting the ribosomes. In most cases, they must be processed so that they fold into their correct configuration and become fully functional proteins. And as we know, the 3-D conformation of a protein is critical to its proper function.

    Protein correctly folded into its functional configuration is said to be in its native configuration, whereas proteins incorrectly folded or unfolded are said to be nonnative.

    Nascent proteins require help to properly fold in the highly charged, inhospitable aqueous environment of the cell, and this help is provided by chaperone proteins. One family of chaperones forms large, multi-unit complexes called protein folding machines. These provide an environment that is electrically neutral, where ionic charges common in the aqueous environment of the cell don't interfere with proper protein folding.

    Amino Acid Side Chain Modification
    Some proteins contain side chains of amino acids to which additional functional groups are added after translation.

    • phosphorylation
      • kinase enzymes phosphoryulate side chains
      • phosphorylases remove phosphate groups
      • each of these reactions changes protein conformation and reactivity
      • organisms produce hundreds, or even thousands of different kinases, each of which phosphorylates different areas of a protein.
      • this is an important mechanism allowing proteins to change affinity and function and become dynamic machines, interacting with each other to perform a multitude of cellular functions.

    • ubiquitination: the executioner's song
      • multiple copies of a 76-aa protein ubiquitin (its name suggests how common this reaction is!) attached to Ε residues of a protein target it for destruction
      • ubiquitins are found only in eukaryotes, where they are highly conserved across species.
      • they are used primarily to regulate the lifespan of short-lived proteins such as those involved in cell-cycle regulation
      • they also may be attached to proteins that are damaged, mutated, or otherwise need to be destroyed to prevent problems in the cell.

    Signal Sequences
    How does a newly made protein "know" where to go? Does it belong in the nucleus? cytoplasm? an organelle? a plasma membrane?

    Nascent proteins contain a signal sequence at their amino-terminal end, and this sequence directs the protein to its correct location in the cell.

    Plasma membrane proteins have a 15-25-aa sequence with affinity for particular membrane channels in the endoplasmic reticulum.

    In the e.r., peptidase cleaves off the signal sequence, and the protein is directed from the e.r. to its proper membrane location.

    The Signal Hypothesis

    • the first 3 - 12 amino acids of the some new polypeptides destined for plasma membrane function may form a signal sequence, which initiates a complex, multi-step insertion of the membrane of an organelle or the cell itself.

    • the ribosome attaches to the membrane, guided by a signal recognition particle and a docking protein. The ribosome essentially extrudes the nascent polypeptide directly into the membrane (squirt)

    Nuclear localization sequences (NLRs) in newly made DNA- and RNA-polymerases are recognized by receptors in the cytoplasm that transport the new proteins through the nuclear pores and into their new home.


    Dominance and Recessiveness at the Protein Level

    Note that in the case of a fully recessive lethal or deleterious allele, a heterozygous carrier may not show obvious symptoms of the disorder, but that blood levels of the affected enzymes can show heterozygosity. (Many such inborn errors can be detected via blood test.)

    When a single dominant allele provides sufficient enzyme to prevent symptoms of the disorder, the disease can be said to exhibit haplo-sufficiency.

  • In some disorders a single allele does not provide enough product for normal function.
  • In these cases, the heterozygotes suffer from haplo-insufficiency, and the mutant allele is thus dominant.
  • This should not be confused with a mutation that causes gain-of-function dominance, in which the mutant allele is not inactive, but altered in function so as to produce a new phenotypic effect. In a case like this, the new mutant allele which has a different function than the original protein can sometimes be dominant over the wild type allele.

    Haplo-insufficiency and haplo-sufficiency suggest that for many enzymes, there is a threshold concentration of product that will allow the cell to function viably--or not.