Transcription: Re-writing the DNA code into RNA By the end of this section, you should know more about In the next two lectures, we will discover how genes enable the cell to make proteins for various functions. Here is a gigantic overview of gene expression in eukaryotes that will set the stage for us to add a little bit more detail.

More Genes Does not Equal Greater Complexity How can our species, Homo sapiens, have about the same number of genes as the humble Wall Cress (Arabidopsis thaliana), and many fewer genes than many other plants? Aren't we the most complex of organisms?

Animal diversity in structure and function is due more to differences in how DNA is packaged, read and processed than in differences in quantity of DNA/genes. More derived eukaryotes can use the same genes to manufacture different proteins by "editing" those genes at the level of RNA.

This allows the average mammal genome consisting of about 20,000 genes to produce more than 100,000 different proteins from those genes that encode proteins.
How can this be?

It's all about the RNA. Transcription is the process of manufacturing RNA.

RNA: Structure and Properties DNA is the permanent "blueprint" for constructing and operating an organism.

DNA does not govern these things directly, nor by itself. For a gene to exert its function, it must first be transcribed ("rewritten") into another nucleic acid language, that of RNA.


Types of RNA

  • RNA can be either Some functional RNA molecules are unique to eukaryotes, and are involved in the processing of other RNA molecules to effect control of gene expression. These are the

  • Long noncoding RNAs (lncRNAs or just ncRNAs) are not translated into protein, and their function is largely unknown. They are transcribed from most regions of the genome. Some are known to participate in dosage compensation, but most are a mystery.

    Transcription and translation are happening constantly in any cell. Hence, rRNA, tRNA, mRNA and snRNA are constantly being transcribed (i.e., their transcription is constitutive).

    The other functional RNAs (miRNA, siRNA, piRNA, lncRNA) manufactured only when they are needed. See a complete list of the types of RNA discovered so far.

    Messenger RNA: Encoding Polypeptides The only type of RNA that carries the code for building proteins is mRNA. In the initial phase of gene expression, DNA is transcribed into mRNA, which is read by the ribosomes and translated into protein.
    Proteomes The full range of proteins a particular organism is able to manufacture is known as the complete proteome.
    A cellular proteome is all the proteins found in a specific type of cell (and/or tissue) under a specific set of environmental conditions.

    Gene expression changes not only on a daily, hour-by hour basis, but also over the lifespan of an organism.
    Even your daily Circadian Rhythm is ultimately under genetic control, as the hormones and other molecules that direct your body to perform daily functions are at least partly encoded by one or more genes that respond to daily cycles.

    Transcribing mRNA in Prokaryotes No matter which type of RNA is being manufactured, the machinery of transcription is essentially the same.
    Transcription is the process by which the DNA code is rewritten into the informational code of RNA.

    The three phases of this process are known as initiation, elongation, and termination. Initiation When a gene is being transcribed, the nitrogenous base code must be made available for reading by polymerizing enzymes. But where to begin? Much of what we first learned about transcription came from our old pal, E. coli.

    At the start of every gene, there is a short base sequence (about 60 base pairs (bp) long) known as the promoter. This is where the polymerizing enzyme will attach.

    Promoters: Conserved/Consensus Sequences of DNA Within species, and even across species, many homologous genes have small segments of base pairs exactly or almost exactly in common. These are called conserved sequences.

    When conserved sequences are found in several different locations in the genome, or in homologous regions of genomes of different species, they are known as consensus sequences. These may have resulted from functional or evolutionary (i.e., common ancestry) relationships among the sequences.

    Retention of conserved sequences suggests that these sequences have not tolerated a great deal o mutation: their products are sensitive to changes, and may not function in mutant form.

    In E. coli and other prokaryotes, one such consensus sequence is found in the promoter of all genes: The Pribnow Box.

    RNA Polymerase The promoter is the starting location of RNA synthesis. In prokaryotes, that enzymatic complex known as RNA polymerase (RNAP or RNApol) is the main player.

    RNA polymerase must be able to

    Prokaryotic RNA polymerase consists of a Note:
  • RNA polymerase, armed with its sigma factor, attaches to the promoter and denatures the double helix.

  • Promoters of different genes may vary slightly in sequence. Different sigma factors that recognize specific sequences are active under the physiological conditions that make it necessary to transcribe a specific gene.

  • Like DNA replication, RNA transcription takes place in the 5' to 3' direction. Incoming nucleotides are oriented with the 5' (phosphate) end pointing towards the 3' (-OH) end of the growing chain.

  • Unlike DNA replication, RNA transcription involves little proofreading. (Though there is some evidence that RNApol may pause at a mismatch and replace the mistake.)
    (Why do you suppose proofreading is less critical in manufacturing RNA than in replicating DNA?)

    The DNA template strand to which the mRNA nucleotides are attached is known as the non-coding strand.

    The DNA strand that is not used as a template is known as the coding strand

  • Viruses may always use the same DNA strand as a template.

  • Bacteria (and possibly eukaryotes) may use either strand of DNA as the template, but only one side is the template in any given gene or gene sequence.


  • RNA polymerase travels along the DNA template strand, laying down nucleotides in a 5' to 3' direction, like SO.
  • Hydrolysis of the two phosphate bonds of the incoming nucleotide yields energy that drives the reaction.

  • Viruses may constantly use the same DNA strand as a template.

  • Bacteria and eukaryotes use either strand of DNA as the template, but only one side is the template in any given gene or gene sequence.


  • In prokaryotes, two termination mechanisms are known

    Intrinsic/Direct Termination

    Rho-dependent Termination

    The Finished mRNA

  • mRNA rolling straight off the DNA template is known as the primary transcript.

    But this is only one of the many ways that eukaryotic transcription is more derived than prokaryotic transcription. Let us explore.

    Eukaryotic Transcription: What's Different? A typical prokaryote has about 1000 genes, and one type of RNA polymerase to transcribe them all.

    Eukaryotes have many more genes than prokaryotes, and also have a great deal of non-informational DNA interspersed between the genes. This means that the factors searching for promoters have a lot more searching to do before they connect. To facilitate this, eukaryotes have not one, but three different types of RNA polymerase:

  • In eukaryotes, transcription takes place in the nucleus, and transcripts move out of the nucleus via the (selectively permeable) nuclear pores. Translation takes place in the cytosol (liquid portion of the cytoplasm), facilitated by the ribosomes lining the rough endoplasmic reticulum.

  • organelle DNA is transcribed and translated within the organelle, yet another reminder of the Endosymbiont Model of their origins.

    Quick Guide to Eukaryotic Transcription Acronyms

    Let's watch an overview of eukaryotic transcription.

  • In eukaryotes, varied proteins known as general transcription factors (GTFs) bind around the promoter before RNA polymerase can attach. These are somewhat analogous to the prokaryotic sigma factors.

  • The GTFs attract RNA polymerase to the proper location, and help it attach correctly to the DNA template.

  • There are many GTFs, each with its own name, such as TFIIA, TFIIB, etc.
    (TF for "transcription factor" and II for "RNA polymerase II").

  • Together, RNA polymerase II and the GTFs will comprise the preinitiation complex (PIC). The Eukaryotic Promoter: TATA Box
  • As in prokaryotes, eukaryotic promoters are located about 25bp upstream (5' direction) of the coding region of the DNA.

  • The promoter includes a core sequence:


  • This sequence is called the TATA box (or Goldberg-Hogness Box), and is analogous to the Pribnow Box of prokaryotes.

  • The TATA box is found in archaeans as well as eukaryotes, providing another bit of evidence of their shared ancestry.


    The Initiation Complex

  • The TATA Box is the binding site of TATA-binding protein (TBP) , a component of TFIID (one of the GTFs).

  • TFIID is the first GTF to bind to promoter. It attracts (1) other GTFs and (2) RNA polymerase II to the site.

  • The GTFs and RNA polymerase II also bind to the site, creating the BIG BIG initiation complex.

  • Once properly seated, RNA polymerase II detaches from the GTFs via the activity of the carboxyl tail domain (CTD) nestled inside its β subunit.

  • The CTD is phosphorylated (with ATP), which
  • This change weakens RNA polymerase II's affinity for the transcription factor (GTF) proteins.

  • RNA polymerase II gently dissociates from the GTFs and goes on its merry, elongating way.

  • If multiple copies of the gene are being transcribed, some GTFs remain at the promoter and attract additional RNA polymerases to the site, and transcription proceeds with several RNA polymerases in a conga line, each producing a new transcript.

    Elongation in Eukaryotes
    Elongation is essentially similar to that seen in prokaryotes, with the new RNA being laid down inside the transcription bubble (bordered by two Y-junctions) formed by the denaturation of the DNA.

    Two major differences:


  • Processing of the eukaryotic RNA transcript begins even as it's being manufactured. What was once believed to be post-transcriptional modification is now understood to be cotranscriptional.


    Processing the 5' and 3' ends of the transcript

    Introns and Exons

  • The coding region of the mRNA transcript is comprised of introns, which are cut out and discarded, and exons, which are spliced together to form an RNA sequence that is completely colinear with the protein it encodes. (Phillip Sharp et al. published this work in 1977. In 1993 Sharp was awarded the Nobel prize for this work.)

  • Recall that the number and size of introns varies not only across genes, but also across species.

  • Why bother with introns?

    Intron Excision, Exon Splicing
    How are the introns excised and the exons splice together?

    Several models for intron excision/exon splicing have been proposed.

    Self-splicing Introns

    Self-splicing mitochondrial and chloroplast introns (group II introns)

  • A spliceosome is a protein-RNA complex that mediates excision of introns and splicing of exons.

  • The spliceosome apparatus consists of small nuclear RNAs (snRNAs) complexed with protein to form small nuclear ribonucleoproteins (snRNPs), affectionatly known as "snurps".

  • Five different snurps take part in this process, and they comprise the spliceosome:

    U1, U2, U4, U5 and U6

  • The RNA portion of each of these ranges from 100-215 base pairs in length.
  • The RNA portion of the snurp may be complementary to the intron, the exon or one of the other snurps, depending upon its function. Sequence of events in spliceosome process:

    1. U1 binds at the 5' end of the intron to be removed.

    2. U2 binds at a central location of the intron, known as the branch point.

    3. Meanwhile, U4, U5 and U6 bind together to form a complex.

    4. U4/5/6 complex approaches the intron; as it draws near, U4 releases from the complex; the U6 portion of the complex is now unstable and reactive.

    4A. U5/6 complex replaces U1 at the 5' end of the intron.

    5. U6 has a high affinity for U2; it binds to U2, creating a "loop" out of the intron. The two ends of the adjacent exons are now close together, though still separated by the intron.

    6. U5 catalyzes the linkage of the two exons ends which have been brought into close proximity by the binding of U6 and U2.

    7. U5 and RNA may both participate in intron removal; U5 does the actual catalysis of the final phosphodiester bond between the exons. So cool.

    The spliceosome is one type of control mechanism in intron/exon splicing, allowing greater accuracy of splicing.

    Conserved sequences in the genes being spliced are critical to the operation of the spliceosome. Three conserved sequences (one at each end of the intron, and one in the middle) are highly conserved across species, probably for functional reasons.

    Since spliceosomes may physically vary, different introns may be removed from the same transcript, depending on the "needs" of the cell at the given time.

    Result: a huge variety of products potentially coded by the same gene. (alternative splicing; exon shuffling)

    Evolutionary Origin of Introns and Exons

    Oddities of Eukaryotic RNA: Editing

  • In some cases, DNA sequence does not predict the protein product's amino acid sequence, even when introns/exons are considered.

  • It is now known that--at least in eukaryotes--RNA is sometimes "edited" at the nitrogenous base level, with chemical reactions occuring to change one nucleotide to another.
  • RNA editing can occur via: The significance of this phenomenon can most notably be seen in Trypanosoma, the flagellate parasite responsible for African Sleeping Sickness. But could gRNA be the Trypanosoma's Achille's Heel? Eukaryotic hosts of the parasite lack gRNA.