2015 Learning Module:Genome Editing Proteins

Be extra sure to explore the information in all three tabs below

Towards a Genomic Cure DNA, Genomes and Proteins This Year's Proteins


Zinc Finger Nuclease
A Zinc Finger Nuclease bound to DNA

The theme of this year’s Protein Modeling event is "Editing the Human Genome".

Recent advances in the molecular biosciences have made it possible for us to modify the human genome.

Why would we want to do this?

Well, in the 60 years that have passed since Watson and Crick first described the structure of DNA, we have come to understand that the sequence of nucleotides in DNA encodes the sequence of amino acids that make up our proteins (more on this is Section 3). We also understand that most human diseases are caused by mutations (changes in nucleotide sequence) in genes - that result in changes in the amino acid sequence of specific proteins. Changing the amino acid sequence may alter the normal structure and function of the protein, leading to diseases. So, doesn’t it make sense that we would now begin to perform some "molecular surgery" on these defective genes as a new approach to treating/preventing human disease?

The resources provided on this web site will provide you with the background knowledge needed to understand this genome editing process and to create a physical model of a variety of proteins related to this process, including:


  • A current Example of Genome Editing
    • What Happens When CCR5 is Mutated?
    • Genome Editing Leading to a "Functional Cure" for HIV

  • Designing Genome Editing Nucleases
    • Zinc Finger Nuclease Proteins and Editing a Genome
    • The Structure of Two Zinc Finger Nucleases Bound to DNA

  • Additional Resources for Learning About HIV/AIDS
    • Structural Biology of HIV
    • Life Cycle of HIV
    • Current Approaches to Treat HIV Infection
    • Upcoming Approaches to Treat HIV Infection
    • Timeline of HIV Research

A Current Example of Genome Editing

Tim Brown, the Berlin Patient

Who is Tim Brown? And what does he have to do with this story?

Before the genome editing approach was tested, Tim Brown is believed to be the only person who has been functionally cured of HIV. After having controlled his HIV for many years with antiretroviral therapy, Tim was "lucky" in that he developed acute myeloid leukemia (AML) – requiring bone marrow stem cell transplantation. The donor for his transplantation was homozygous for a naturally occurring DNA basepair deletion in their CCR5 gene. The CCR5 protein is a co-receptor for HIV infection. The transplanted T-cells from the bone marrow stem cell transplantation were resistant to HIV infection due to the missing CCR5 protein. As a result, this transplantation not only cured Tim’s AML,. . . it also cured his HIV. Four years after the transplant, Tim remains free of both cancer and HIV.

You can read more about Tim, The Berlin Patient at: http://defeathiv.org/berlin/.

What Happens When CCR5 is Mutated?

CCR5 protein

CCR5 or Chemokine Receptor 5 is a membrane receptor protein found on human immune cells. Its primary function is to bind specific chemical signals, called chemokines, and recruit other immune cells. The structure of the molecule is shown in the figure to the right.

The structure of the Chemokine Receptor CCR5 shown here is displayed within the context of the T-helper cell membrane. The PDB entry 4mbs.pdb is that of an engineered molecule fused to rubredoxin (not shown here for clarity) and in complex with a fusion inhibitor drug bound to the extracellular face of the molecule.

The CCR5 protein is an HIV co-receptor. It cooperates with the host cellular CD4 protein to allow the initial docking of the HIV virus onto T-cells, and subsequent infection. The CD4 bound HIV envelope spike protein use this molecule as a co-receptor to enter and infect host cells. In some instances HIV uses another similar chemokine receptor CXCR4 as the co-receptor for entry into host cells.

Illustration by Taina Litwak, CMI

Curiously, approximately 15-20% of the northern European population is heterozygous for a naturally occurring 32 base pair deletion in their CCR5 gene – making them less susceptible to HIV infection. Approximately 1% of this population is homozygous for this mutation – and resistant to HIV infection.

The eleven amino acids encoded by the 32 base pair deletion are located midway through the gene, changing the translation reading frame. Therefore, the protein product translated from the gene containing this deletion is truncated – as a result of the out-of-frame STOP codon encountered 31 codons after the deletion site.

Explore the CCR5 Gene Map shown below to see the 32 base pair deletion. Note, this deletion is about half way through the full length of the gene so you will have to scroll on the image below to find it. Click Here to download the entire CCR5 Gene as a PDF document.

The protein produced by the CCR5 delta 32 mutant gene is non-functional and does not support HIV infection.

Genome Editing Leading to a "Functional Cure" for HIV

Based on the functional cure of the Berlin patient it appears that introducing the CCR5 delta 32 mutation may make host cells resistant to HIV. Since HIV infection is persistent, making the host cells resistant may provide a functional cure for HIV infected individuals. Using an engineered nuclease, such as a zinc finger nuclease, and specifically targeting the CCR5 gene in HIV patients to innactivate the CCR5 protein will make the patient’s endogenous T-cells resistant to further infection.

Sangamo Biosciences (a biotech company specializing in the development of therapeutic zinc finger nucleases) has developed a zinc finger nuclease that is targeted to disrupt the CCR5 gene. This approach is currently being tested in a Phase 2 clinical trial with HIV/AIDS patients by Sangamo Biosciences in collaboration with groups from the University of Pennsylvania School of Medicine and the Albert Einstein College of Medicine.

Watch your Twitter-feed for more updates on this story from the Twitterverse.

Designing Genome Editing Nucleases

Zinc Finger Nuclease Proteins and Editing a Genome

Zinc Finger Proteins

Zinc Finger Nucleases are sequence specific DNA binding proteins. Each finger is composed of a short alpha helix and a 2-stranded beta sheet. Two histidines from the helix and two cysteines from the beta sheet simultaneously bind a zinc atom to stabilize this protein motif. Each finger recognizes and binds to three consecutive base pairs in double-stranded DNA. By linking 6 zinc fingers together,it is possible to target a unique 18 basepair sequence of DNA. But most natural zinc finger DNA-binding proteins use only 3 consecutive fingers to bind DNA. Can you guess why?

For an excellent review of the history of zinc finger proteins, and the application of engineered zinc finger nucleases (ZFNs) to genome editing, see:

Klug, A., The discovery of Zinc fingers and Their Application in Gene Regulation and Genome Manipulation. Ann. Rev. Biochem. 79, 213-231 (2010).

The two main domains of a Zinc finger nuclease protein

Aron Geurts, a researcher at the Medical College of Wisconsin, uses zinc finger nuclease proteins in his research. In the videos below, he describes the different domains of a zinc finger nuclease protein.

The Zinc Finger Domain

The Nuclease Domain

How long is a statistically unique site in the human genome?

If you want to direct a nuclease to a unique DNA sequence on the human genome, the nucleotide sequence must be long enough to be statistically unique. You can easily calculate how long this sequence must be by asking yourself:

  • How many possible 1 base sequences are there?      Answer: 4
  • How many possible 2 base sequences are there?      Answer: 4 x 4 = 16
  • How many possible 3 base sequences are there?      Answer: 4 x 4 x 4 = 64
  • How many possible 4 base sequences are there?      Answer: 4 x 4 x 4 x 4 = 256
  • How many possible 5 base sequences are there?      Answer: 4 x 4 x 4 x 4 x 4 = 1024

Continue this line of reasoning until the number of possible sequences exceeds the number of nucleotides in the human genome – 3.2 billion. The answer is 16 nucleotides.

  • How many possible 16 base sequences are there?      Answer: 416 = 4,294,967,296

Therefore, regardless of which protein you use to edit the human genome, it must be able to bind specifically to a unique 16 basepair sequence of DNA.

Is 16 basepairs really a long enough sequence of DNA?

Aron Geurts, a researcher at the Medical College of Wisconsin, uses zinc finger nuclease proteins in his research. In the video below, he describes what length of DNA sequence a zinc finger nuclease is generally designed to find.

The Structure of Two Zinc Finger Nucleases Bound to DNA

To target a unique site in the human genome, researchers have created two different 4-finger proteins – each one targeting a different 12 basepair sequence, separated by 5 basepairs. Each 4-fingered protein is equipped with one half of a FokI nuclease domain. The FokI only functions as a homodimer. So, when both 4-fingered proteins bind to their targets, the functionally active nuclease homodimer forms and makes a double-stranded cut in the DNA separating the two different DNA binding sites. This design allows these zinc finger nucleases to specifically bind to 24 bases in the genome. Once the cut is made, an error-prone DNA repair system will try to repair the damage - and disrupts the genome by introducing mutations at that site.

The illustration below shows:

  • Two different zinc finger proteins,
  • Binding to two different 12 basepair DNA sites,
  • Separated by a 5pb spacer,
  • Bringing two monomeric FokI nuclease domains together such that they form a functional homodimer that cuts the DNA.

This figure was taken from:

Miller, J.C. et.al., An improved zinc-finger nuclease architecture for highly specific genome editing. Nature Biotechnology 25, 778-785 (2007).

Additional Information for Understanding HIV/AIDs

The anatomy of an HIV virus

Structural Biology of HIV

The Human Immunodeficiency Virus (HIV) is an RNA virus that can infect specific immune cells in our body, called T helper cells. The RNA genome of HIV is encased in a capsid, which is in turn covered by an envelope derived from the host cell membrane. The structures and functions of most of HIV’s proteins are now known. Explore the anatomy of HIV and learn about the different structural proteins, enzymes and accessory proteins using this RCSB PDB animation or poster linked to below. We are still learning about the accessory and regulatory proteins of HIV that exploit the host cell’s machinery for its own advantage.

Take a look at the Structural Biology of HIV poster from the RCSB Protein Databank web site at www.rcsb.org/pdb/education_discussion. This poster will introduce the overall structure of an HIV Virus, which is important when understanding how CCR5 interacts with HIV

Life Cycle of HIV

The HIV life cycle can be summarized in the following steps:

  • Attachment: The HIV spike or envelope protein, gp120, attaches to the host cell protein CD4 on specific types of T-cells.
  • Fusion and entry: Binding of gp120 and CD4 rearranges their structures allowing the complex to bind another host cell receptor, the chemokine receptors, called CCR5. In some cases an alternate receptor called CXCR4 may replace CCR5 in this interaction. This in turn facilitates the stock of the HIV spike (the protein gp41) to penetrate the host cell membrane and fuse the viral envelope with the host cell membrane.
  • Reverse transcription: Upon entry, HIV sheds its capsid and the 2 single strands of viral RNA are converted to a double stranded DNA by a special viral enzyme called Reverse transcriptase.
  • Integration: The double stranded DNA, or proviral DNA, enters the host cell nucleus and is integrated in the cell’s genome by another special viral enzyme called Integrase.
  • Transcription and translation: The proviral DNA is transcribed and translated like any other host cell gene using host cell machinery (RNA polymerase, Ribosomes etc.)
  • Assembly and budding: The various viral proteins and RNA come together to assemble the virus. At this stage some of the viral proteins are still linked to each other as part of the polyprotein synthesized by the virus. Various HIV proteins and RNA are packaged into an immature viral particle that buds off from the host cell encased in its membrane.
  • Maturation of viral particle: With action of the viral protease the various HIV proteins are cut and separated, free to perform their specific functions. This rearrangement or maturation helps the HIV become a mature infectious particle ready to infect another cell. All the steps of the viral lifecycle are presented in the HHMI Biointeractives animation, narrated by HHMI investigator, Bruce Walker, MD.

Watch this video on the HIV life cycle from BioInteractive.

Current Approaches to Treat HIV Infection

Research in the last three decades has yielded a number of different strategies to block the HIV lifecycle. Today, more than 25 antiretroviral drugs are available to manage HIV infection, significantly reducing morbidity and mortality. With current treatments, HIV infection has become a chronic disease – manageable, but with lifelong medications.

The approaches currently used to treat HIV infections include:

  1. Viral Enzyme inhibitors: block the actions of some critical enzymes in the HIV lifecycle.
    • Reverse transcriptase inhibitors (RTI): block initial conversion of viral RNA to proviral DNA that is integrated in the host cell genome
      • By mimicking the enzyme substrate and directly binding to the active site (nucleoside RTIs)
      • By binding to a site near the enzyme active site and blocking its function (non-nucleoside RTIs)
    • Integrase inhibitors:block integration of proviral DNA into the host cell genome preventing permanent infection of the host cells
    • Protease inhibitors:block cleavage of viral polyprotein, preventing maturation of HIV to infectious particles
  2. Entry inhibitors: block interaction of the CD4-gp120 complex with the chemokine co-receptor preventing entry of HIV in the host cell
  3. Fusion inhibitors: block the structural changes in the stock of the HIV spike (gp41) that are needed for the viral envelope and host cell membranes to fuse

Usually an HIV infected individual is treated with combinations of the above medications. For more information about the current HIV/AIDS treatments approved by the FDA see http://www.fda.gov/ucm118915.

Upcoming Approaches to Treat HIV Infection

The rapid mutation rates in HIV and various other factors may lead to resistance to these drugs at any time. Thus there is still a need to develop a more long-lasting cure. This is where gene therapy comes in and provides two scenarios:

Making the host cells resistant to HIV:

Currently researchers are using zinc finger nucleases to target the CCR5 gene in stem cells that give rise to blood cells and introduce a deletion or disruption in the gene. As a result these cells are unable to make a functional CCR5 protein and become resistant to HIV infection. A treatment protocol using this approach is currently in a Phase II clinical trial conducted by a group from the University of Pennsylvania School of Medicine, the Albert Einstein College of Medicine and Sangamo Biosciences (a biotech company specializing in the development of therapeutic zinc finger nucleases).

For your own interest you can review a report about the Phase I trial at http://clinicaltrials.gov.

Seek out and destroy all the integrated proviral DNA:

A recent research report has suggested the possibility of using a gene therapeutic approach to specifically identify and edit out the integrated proviral HIV-1 DNA. While there is a long way before this can even be tested as a treatment option it offers the hope that gene therapy can be used for dealing with tough diseases like HIV/AIDS.

Watch this video from Temple University on the successful elimination of HIV viruses from cultured human cells.

A Timeline of HIV Research

Some of the major events in the HIV timeline are listed here and the evolution of strategies for treating HIV are highlighted.

Year Major Event Treatment Stategies
1981 CDC reports a rare form of pneumonia
1982 CDC introduces the term Acquired Immune Deficiency Syndrome, or AIDS
1983-84 HIV is established as the cause for AIDS
1985 FDA approves the first HIV antibody test and blood banks begin screening for HIV
1987 FDA approves Retrovir as the first drug to treat HIV First reverse transcriptase inhibitor
1995 FDA approves saquinavir to treat HIV First protease inhibitor
1996 Combination therapy for HIV treatment proposed Combination therapy
1996 FDA approves Viramune (Nevirapine) First Non-Nucleoside RTI
2002 OraQuick Rapid HIV test is approved, allowing HIV antibody testing in as little as 20 minutes using blood from a finger prick
2003 FDA approves Fuzeon (enfuvirtide) First HIV fusion inhibitor
2006 First one-pill-a-day HIV med available HIV treatment with 1 pill a day
2007 FDA approves Isentress (raltegravir); FDA approves CCR5 blocker Selzentry (maraviroc) First integrase inhibitor First integrase inhibitor and first entry inhibitor
2010 "The Berlin Patient", a man living with HIV is classified as cured of his HIV Transplant (in 2007) involving HIV-resistant (CCR5 delta-32) stem cells for the treatment of leukemia
2013 Phase I Clinical trial of CCR5-specific zinc finger protein nuclease (SB-728-T)* Clinical trial for gene therapy to make HIV resistant cells*
2014 Report shows viral suppression may bring HIV transmission risk close to zero. Research report showing use of gene therapy to disrupt the latent HIV-1 provirus* Decreasing transmission risk by reducing viral load. Research report showing use of gene therapy to disrupt the latent HIV-1 provirus*

* These treatment options are still under development or in Clinical Trial phases.

An expanded view of timeline is available from http://www.poz.com/timeline.shtml.

Websites related to HIV/AIDS

In order to truly appreciate and successfully model this year's Protein Modeling Event structures, a thorough understanding of DNA, genomes and proteins is needed. This section will explore these amazing macromolecules in more detail using suggested physical model kits, online resources and additional websites.


DNA carries your genetic information in that it determines the sequence of amino acids in your proteins. These protein sequences, in turn, determine the shape and structure of your proteins. Proteins then work like nano-scale molecular machines, carrying out countless different tasks in your body.

The human genome is divided into 23 chromosomes, found in the nucleus of every cell. To keep these long linear polymers of DNA from getting all tangled up, the DNA of each chromosome is packaged into repeating structural units called nucleosomes. The DNA exists as a double–stranded structure with two twisting backbones running in opposite directions and four different bases: adenosine (A), thymine (T), guanine (G), and cytosine (C).

The visual representation of DNA above is completely interactive and can be rotated in 3-dimensions by clicking and dragging with your mouse.

The four types of bases that make up DNA can form base-pairs between the two strands of double-stranded DNA. Adenosine only pairs with thymine and guanine only pairs with cytosine, making the two strands of DNA complimentary. The sequence of bases is often represented with abbreviated letters as shown below. It is this order of DNA bases that contains the key information needed for creating proteins and passing along genetic information.

For more information on DNA structure, explore the RCSB PDB's Molecule of the Month feature on DNA article by the molecular illustrator, David Goodsell.

Genomes and Creating Proteins

What is a Genome?

The complete genetic information (DNA) that functions as the blueprint for creating an organism is called a genome. Each cell of the organism contains this genome. As the cells divide the genomic information is carried forward through generations.

The human genome is composed of 3.2 billion basepairs. In a heroic effort starting in 1990s, several international research groups participated in sequencing the complete DNA sequence of the human genome. While the sequencing was completed in 2003, scientists are still working on reading and understanding the meaning and complete implications of the genomic sequence. Today it is possible to sequence an individual’s genome and figure out if they are genetically pre-disposed to specific diseases or how they will respond to certain treatments for a specific disease.

The Flow of Genetic Information - From Genes (DNA) to Protein

While double–stranded DNA has become one of the most iconic structures in modern biology, it is only part of the story. DNA is only important in that it contains the information cells need to make proteins.

Proteins are made of smaller building blocks called amino acids that are linked together to form long chains. These protein chains spontaneously fold up into compact 3-dimensional shapes following basic principles of chemistry and physics. Each type of protein has a unique sequence of amino acids, which determines its unique 3-dimensional shape and function.

The sequence of amino acids in a protein is encoded by the sequence of bases in the gene (DNA). (Read this over – – and over again – – and think about it until it makes sense to you). When a gene is "expressed" – or made into a protein – it is first copied into messenger RNA (mRNA) by the process known as transcription. RNA polymerase is an enzyme that synthesizes mRNA by reading your DNA. Messenger RNA is complementary to the template strand of DNA – following the base pairing rules (A pairs with U; G pairs with C). The short animation below shows translation.

Once the mRNA has been created from the DNA, it is bound by a large macromolecular complex called a ribosome that reads this sequence of mRNA bases and builds a protein with a specific sequence of amino acids based on the mRNA sequence. The mRNA is read three bases at a time based on the code that is illustrated in The Standard Genetic Code shown below. Note that the code is degenerate – meaning that most amino acids are encoded by more than one three-base sequence (called a codon).

Special molecules called tRNA read the sequence of mRNA three bases at a time and add the correct amino acid to the growing protein chain based on the mRNA sequence. The short animation below shows this important process called translation.

Protein Structure

Protein Structure Jmol Tutorials

The Protein Structure Jmol Tutorials walk through the four levels of protein structure using interactive Jmol molecular visualizations, including real protein examples with interactive controls.

Protein Databank (www.rcsb.org) Resources

Learn about protein structure and function with this overview printout and video developed by the RCSB Protein Databank.

Principles of Protein Design

Most protein sequences instantaneously and reliably fold into its stable and functional shape, mostly without any assistance. Over the years, we have learned some rules about protein folding, but are still not able to predict protein structures accurately using computational structure prediction algorithms alone. From studying the more than 100,000 experimentally determined structures in the Protein Data Bank (PDB), we can see that globular proteins have a hydrophobic core and most polar or charged amino acid side chains are located on the surface of these proteins. We can also see that there are a finite number of different protein domains. Small changes in these domains (through evolution in nature or by engineering) can produce changes in its specific function. Also combining the protein domains in various ways can produce a large variety in protein functions.

As we learn more about the various functions that these protein domains can perform, scientists have engineered new proteins by combining and adapting the functions of these proteins to produce novel functions. Read more about designing proteins and learn about a few designer proteins in the RCSB Molecule of the Month feature on Designer Proteins.

This Year's Proteins

The publication of a structural model for DNA in 1953 transformed biology forever. Not only did the model help us understand the basis of inheritance it also laid the foundation for understanding the central dogma of biology. Since then, the DNA structure has inspired many other transformations in biology and medicine – for example, DNA/gene sequencing, recombinant DNA technology, and more recently, genomic editing.

Knowledge from genomic sequences is now being included in designing personalized treatments – for example for cancers and rare genetic disorders, in deciding the best medication for treating an individual and in conducting non-invasive pre-natal genomic analysis. In some recently designed strategies in genomic medicine, attempts are being made to edit the genome at specific locations so that disease-causing or disease-correlated regions can be specifically changed for a desired outcome.

The Two Domains of a Genome Editing Protein

Genome editing requires bio-molecular tools (proteins and/or nucleic acids) that can specifically recognize a target sequence and cut the genome in a precise and predictable way. The gene sequence that is cut out may be replaced with another gene sequence using homology dependent repair (HDR) or joined back together with intentionally introduced errors inactivating the gene using non-homologous end joining (NHEJ).

The video below provides a quick overview of these genome editing proteins and how they bind to and cut DNA at a specific location.

Summary - Genome Editing Proteins Generally Have Two Main Domains

  • Nuclease Domain: The protion of the protein responsible for cutting the DNA sequence once it is bound at the correct location.
  • DNA Binding Domain: The portion of the protein responsible for finding a specific DNA sequence or gene and binding to it at the correct location.
Model Level Protein Name PDB ID
Pre-build Model - all levels Nuclease Domain 2fok.pdb
Onsite Model - Invitational Designed Zinc Finger-like Proteins 1psv.pdb
Onsite Model - Regional Zinc Finger Protein 1mey.pdb
Onsite Model - State TAL Effectors 3v6t.pdb
Onsite Model - National CRISPR Cas 9 4un3.pdb

This Year's Competition Proteins

Pre-Build Model - Nuclease Domain of FokI

Coordinates for the Model

The 2015 Pre-Build Model should represent amino acids 421-560 of chain A of the restriction endonuclease protein FokI based on the PDB file 2fok.pdb.

You can access the Pre-Build online design environment at http://cbm.msoe.edu/scienceOlympiad/designEnvironment/prebuild.html. Also study what types of additional features could be highlighted in the pre-build model in Section 3.

Background Information

To have a useful genome editing protein, a nuclease domain must be able to cut the DNA on both strands in some predictable way. This ability is already found in several known biological molecules in various organisms. For example many bacteria have evolved to make nucleases that can defend its genome from viral invasion. These enzymes, called restriction nucleases, recognize and cut out foreign DNA. The host cell genome is specifically modified with protective chemical groups, hence unaffected by these nucleases.

The challenge is to make the nuclease very efficient and direct it specifically to a target gene only. While there are many types of nucleases in nature, only some are suitable for this specific DNA cleavage. A class of nucleases called Restriction nucleases or restriction enzymes can bind to specific sequences and cut the DNA either within the recognition sequence, close to it or at a remote location.

The FokI Nuclease

One specific nuclease called FokI has been commonly used in designing genome editing nucleases. This is an enzyme derived from Flavobacterium okeanokoites (or Planomicrobium okeanokoites). It can recognize specific DNA sequences (5’GGATG3’ and 5’CATCC3’) and cuts or cleaves it on both DNA strands 9 bases after the first bolded and underlined G and 13 bases after the bolded and underlined C (see image below).

The recognition and cleavage sites of FokI nuclease. Image from www.ncbi.nlm.nih.gov.

Typically the DNA cleavage domain (the nuclease) of FokI of is removed from its natural DNA binding domain and linked to different DNA binding domain to make novel enzymes. The DNA cleavage domain functions as a dimer (as shown in the figure below), hence two copies of the nuclease must bind to the target site to ensure double strand cuts.

Dimer formation in the FokI nuclease catalytic domain. This structure is taken from a structure of FokI nuclease in the absence of DNA. The DNA binding domains are hidden for clarity

Additional Resources

Invitational Onsite Model - Designed Zinc Finger-like Protein

Coordinates for the Model

The 2015 Invitational Onsite model will explore the structure and stability of Zinc Finger proteins but focus on the designed protein that was built to mimic the Zinc finger domain structure based on the PDB file 1psv.pdb. It is expected that at the time of the competition the participant will have some basic understanding of Zinc fingers proteins.

Background Information

The zinc finger protein has a tetra-coordinated zinc at the core of the structure to stabilize its structure. Some scientists experimented with the idea of replacing the zinc coordination with other interactions. This exercise led to the design of a peptide that could adopt the same shape and structure as the DNA binding zinc finger domain but had a completely different rationale for its stability. Explore the protein structure - can you figure out what replaces the zinc interactions?

Additional Resources

Regional Onsite Models - Zinc Finger Protein

Coordinates for the Model

The 2015 Regional onsite models will represent a Zinc Finger protein. It will be based on the PDB file 1mey.pdb.

Zinc finger showing the secondary structure of the protein chain as beta-beta-alpha. The Zinc, shown in the center, is coordinated by sidechains of two Cysteine and two Histidine amino acids.

Background Information

Zinc fingers were first identified in a frog transcription factor (transcription factor IIIA). Interestingly, this protein structure was found to bind both 5S RNA and its cognate DNA. Over the years zinc fingers have been identified in many other proteins and is one of the most common protein domains that binds to specific DNA/RNA sequences.

Each zinc finger domain has ~30 amino acids with two beta strands and a single alpha helix. In addition to its hydrophobic core, it is stabilized by a Zinc ion coordinated by side chains of four Cysteines, four Histidines or a combination of these. The structure of a single zinc finger protein domain is shown in the figure to the right. Most zinc finger containing proteins have a series of these domains linked to each other. These domains bind to the major groove of the DNA. Specific amino acid side chains reach out from these domains to "read" the DNA sequence by interacting with specific DNA bases.

Additional Resources

State Onsite Model - Transcription Activator Like Effector Nuclease Proteins (TALE)

Coordinates for the Model

The 2015 State onsite model will represent Transcription Activator Like Effector Nuclease Proteins (TALE) and will be based on the PDB file 3v6t.pdb.

Background Information

The structure of a single TAL effector repeat showing two helices and Repeat Variable Di-residues (RVDs) that specifically interact with the DNA bases.

Transcription activator-like (TAL) effectors were first identified in species of Xanthomonas bacteria that cause diseases in plants such as rice and cotton. These bacteria inject several factors (including the TAL effectors) in the plant cells activating plant genes that allow the bacteria to thrive. In effect, these proteins act as weapons against the plant cells.

The TAL effectors have a repeat motif composed of ~35 amino acids arranged into two helices connected by a loop. The structure of a single repeat is shown in the figure to the right. At the tip of the loop (position 12 and 13 of the repeat) there are two specific amino acid residues, called Repeat Variable Diresidues (RVDs) that vary in each repeat. These residues are responsible for each repeat being able to recognize specific base pairs in the DNA sequence. Understanding the structure and function of these repeats have helped design TAL effectors that can recognize specific sequences in the genome as part of genome editing nucleases called TAL effector nucleases (TALENs).

These nucleases are also sequence specific DNA binding proteins. The interaction between RVDs in the TAL effectors and the target DNA base pairs define the specificity of DNA binding. If each TALE repeat can bind a single base pair – how many repeat units should be present in the TALEN?

Additional Resources

National Onsite Model - Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)

Coordinates for the Model

The 2015 National onsite model will represent Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) proteins and will be based on the PDB file 4un3.pdb.

Background Information

The name Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) refers to repeated sequences in the bacterial DNA, distributed within regularly spaced non-specific DNA sequences. These sequences are derived from invading viral DNA and can protect the bacteria from future attack from the same virus.

The guideRNA (red) and target DNA (blue) interaction that forms the basis for specific recognition in the CRISPR-Cas system is shown. This structure is recognized by a specific nuclease protein in order to bind and cut DNA.

The specific recognition of sequences in the CRISPR-Cas-system is not through interactions between a protein and specific DNA sequence but a special RNA molecule, called the guideRNA or gRNA. In nature, the Cas genes, code for several different types of proteins. In the CRISPR-Cas9 system discussed here, the CRISPR sequences are transcribed and processed to form crRNA. Another RNA, called the trans activating crRNA (tracrRNA) is also made. The binding of crRNA: tracrRNA assembles a complex localizing Cas9 nuclease to the target DNA site. The two specific nuclease activities in Cas9 can now cut the target DNA. For genome editing, the crRNA:tracrRNA complex is engineered to a guideRNA that can bind the target DNA and direct the Cas9 nuclease activity. The structure of a guideRNA:target DNA complex is shown in the figure to the right.

The targeting mechanism in the CRISPR system is very different from the Zinc Finger and TAL effector based ones, since the recognition is through a guide RNA and not DNA-binding protein domains. The figure below shows how the guide RNA mimics the native system. In addition to designing the guide RNA and delivering it to the cell nucleus the Cas9 also has to be modified with a Nuclear Localization Signal (NLS). Can you guess why?

Additional Resources

Science Olympiad