Every cell in your body contains 3.2 billion basepairs of DNA. The sequence of nucleotides in this DNA determines the sequence of amino acids in your proteins. Proteins act as nano-scale molecular machines that carry out countless different tasks in your body. For example, the XIAP protein, which is the central protein of the 2012 Science Olympiad Protein Modeling Event, is responsible for regulating cell apoptosis (cell death) and your immune response to infections.
Flow of Genetic Information
DNA as Information
DNA Sequencing
Putting it All Together: Connections to Nic's Story
DNA Contains the Information Needed to Make Proteins
While double–stranded DNA has become one of the most iconic structures in modern biology, it is only part of the story. DNA is only important in that it contains the information the cell needs to make proteins.
Proteins are simply linear polymers of amino acids that spontaneously fold up into compact 3D shapes following basic principles of chemistry and physics. Humans are made up of approximately 50,000 different kinds of proteins. Each protein has a unique sequence of amino acids – and that sequence of amino acids is encoded in a sequence of nucleotides that makes up your genome. Each folded protein is an amazing molecular machine that performs a unique job. For example, the beta–globin protein safely binds to and transports oxygen throughout your body; aquaporin transports water through phospholipid bilayers; and influenza hemagglutinin allows the flu virus to infect our respiratory cells.
The Flow of Genetic Information is – – – DNA to RNA to Protein.
The sequence of amino acids in a protein is encoded by a sequence of nucleotides in the DNA that makes up the gene for that protein (Read this over – – and over again – – and think about it until it makes sense to you.) But if you are a eukaryote (which you are, if you are human) your genes are a little more complicated than those of a prokaryote. Your genes are split. That means that the nucleotides that encode the amino acids in any given protein are interrupted by other nucleotides that have nothing to do with the protein. The nucleotides that encode amino acids, and are therefore expressed as protein, are called exons. The intervening nucleotides that do not code for amino acids in your protein are called introns.
When a gene is "expressed" – or made into protein – it is first copied into messenger RNA (mRNA) by the process known as transcription and then translated into protein by a ribosome. But the fact that eukaryotic genes have introns in them means that we have to add another step to this flow of genetic information. The introns of the initial mRNA transcript must be spliced out of the RNA to generate the mature mRNA that consists of one continuous sequence of protein–encoding nucleotides (exons).
The following three steps are involved in the expression of a human (eukaryotic) gene.
1. Transcription
In order to make the proteins that are needed to perform the unique functions required in each of your cells, you must first express a specific subset of your genes as messenger RNAs. RNA polymerase is an enzyme that synthesizes mRNA. Messenger RNA is complementary to the template strand of DNA – following the Watson-Crick base pairing rules (A pairs with U;G pairs with C). This process of RNA synthesis – known as transcription – is shown below.
2. mRNA Splicing
RNA polymerase cannot tell the difference between exon and intron sequences. Therefore, the initial messenger RNA that is copied from the template strand of DNA contains both exon and intron sequences. Eukaryotic cells contain "splicesomes" that can recognize the junctions between exons and introns and splice out the introns from the precursor mRNA to generate the mature mRNA that is ready to be translated into protein. This RNA splicing reaction is animated below.
3. Translation
Once the eukaryotic mRNA has been spliced to remove the introns, the mRNA is bound by a ribosome that translates this sequence of nucleotides into a sequence of amino acids – i.e., a protein. This nucleotide sequence is translated "three nucleotides at a time" based on the code that is illustrated in The Standard Genetic Code. Note that the code is degenerate – meaning that most amino acids are encoded by more than one triplet codon. Also notice that the Codon Chart shown below is color–coded according to the chemical properties of the amino acid that is encoded. It is these chemical properties of the individual amino acids that will direct the spontaneous folding of the protein into a compact 3D shape following its synthesis by the ribosome.
An animation of the decoding of the mRNA by a ribosome, and the synthesis of a protein by a ribosome is animated below. As the mRNA moves through the ribosome one triplet codon at a time, small tRNA molecules bring the appropriate amino acid to the ribosome for addition to the growing protein chain. Note the complementary base pairs (GC and AU) that form between the codons of the mRNA and the anti-codon of the tRNAs.
Expression of the XIAP Gene
The XIAP Gene – that figures prominently in the 2012 Science Olympiad Protein Modeling Event – undergoes transcription, mRNA splicing, and translation to produce the final XIAP protein that is composed of 497 amino acids.
Examine the Human XIAP mRNA → Protein Map provided below to see the four domains of known protein structure that make up this protein and the G → A mutation that occurs at nucleotide 641 of Nic's gene. As you explore Nic's story further, you will understand how this change in a single nucleotide of his XIAP gene is believed to be the molecular basis of his disease.
Click here to explore the Human XIAP mRNA → Protein Map in detail as a PDF.