Part of the Jmol Training Guide
from the MSOE Center for BioMolecular Modeling
In order to view a protein or molecule using Jmol, or any molecular visualization program, you need to have a 3-dimensional structure file. These files contain the (X, Y, Z) coordinates for the atoms that make up a structure, along with information about each atom.
These files can vary dramatically in both size and internal format, depending on how large the structure is and how the structure file was created. The most common molecular structure file formats that you will be using with Jmol are Protein Databank (.pdb) files and MDL Molfile (.mol) files.
The protein databank (.pdb) file format is curated and annotated by the RCSB Protein Databank. The RCSB PDB is an international database that contains archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies that helps students and researchers understand all aspects of micro biology. The RCSB Protein Databank has also created tools and resources for research and education in molecular biology, structural biology, computational biology, and beyond.
The RCSB Protein Databank is the primary source for large protein structure files and will be discussed in more detail before.
The MDL Molfile (.mol) file format was originally designed as part of the Chemical MIME Project by Henry Rzepa. It is similar to .pdb files in that it contains the 3-dimensional locations of atoms in a molecular structure. However, unlike .pdb files, .mol files are often used for smaller structures such as ligands, drugs and sugars.
There are a large number of .mol file sources including ChemSpider, Drug Bank and the NIH Cactus Server. Many chemical drawing programs such as ChemDraw and ChemDoodle export .mol files for viewing created structures in 3-dimensional visualization programs.
Once a structure has been determined, each atom in the structure is assigned an (X, Y, Z) coordinate to mark its location in 3-dimensional space. Additional information compliments these basic coordinates including the type of atom at each location, the chain and the residue the atom is part of. Some structure files contain additional information such as resolution data, temperature numbers, electrostatic potential data and more.
The image below shows a short bit of code from inside of a structure file.
For more information on structure files and how they are determined, visit these RCSB Protein Databank resources:
The RCSB Protein Databank (http://www.pdb.org) is the largest worldwide repository for the processing and distribution of .pdb file structure data of large molecules of proteins and nucleic acids.
There now well over 100,000 structure files available on the www.pdb.org website!
Each structure hosted on the Protein Databank has a unique four character long alpha-numeric identifier, referred to as the structure's PDB ID.
Often more than one .pdb file will exist for a specific type of protein. For example, there are hundreds of .pdb file entries for the relatively common protein Hemoglobin. It is often a good idea to use specific information about a structure listed below to help determine if you have found the best possible file.
When you click on a specific PDB ID, you will initially see the Structure Summary page for the structure. This page includes a variety of useful information about the structure.
The View in 3D Window will let you preview the structure using a web-embedded online Jmol. To view this preview, simply click the "View in 3D: JSmol" button that is located directly below the molecule image on each Structure Summary Page.
Just above the .pdb file Title should be a series of tabs, the fourth of which is the Sequence tab. This section of the .pdb file page provides specific sequence information as well as secondary structure information about the molecule. You can identify the alpha helices or beta sheets as well as the amino/carboxyl termini, which are the first and last amino acids of the protein.
One of the key features of the Protein Data Bank is the ability to search the database for files. You can search for a unique structure if you know its PDB ID, or by using key words and authors. To submit a search query, enter these terms in the search box located near the top center of every www.pdb.org page. After you have entered the search terms in the field, hit enter or click on the "Go" button to the right of the search field.
There are two ways to obtain a .pdb file:
1. Download the File from the RCSB Protein Databank website.
Note that is a good idea to create a new folder for each molecule you work on to organize all of your .pdb files, images, and other related work.
2. Dynamically Load the File from the RCSB Protein Databank Server.
As long as you have an Internet connection, Jmol allows you to dynamically connect to the RCSB Protein Databank and load a structure without downloading it permanently to your computer. You will, however, need to know the four character alpha-numeric PDB ID for the structure you are looking for.
To load the structure file 1qys.pdb:
Note that you do not need to add the file extension (.pdb) when entering this command; just the four character alpha-numeric PDB ID is needed. You do, however, need to include the equal sign "=" with no spaces between it and the name of the .pdb file. This equal sign tells Jmol that you want to access the RCSB Protein Databank servers to find the structure, rather than finding a file locally on your computer.
The RCSB Protein Databank has several regularly updated features as well as some interesting interviews and newsletters that may be useful for any Jmol designer.
The NIH (National Institute of Health) Cactus (CADD Group Chemoinformatics Tools and User Services) Database is a public website with several powerful chemoinformatics tools that can provide structures, data, and tools to help explore molecular structures. Most of the tools on the NIH Cactus Database focus on small molecules and use the (.mol) file format.
Like .pdb files, small molecule structures from the NIH Cactus Server can be loaded into Jmol dynamically without downloading it permanently to your computer. As long as you have an Internet connection, you can load a specific small molecule directly from Jmol.
To load the small molecule aspirin:
Note that you need to include the dollar sign "$" with no spaces between it and the name of the small molecule. This dollar sign tells Jmol that you want to access the NIH Cactus servers to find the structure, rather than finding a file locally on your computer.
While almost every molecular structure you can think of will be identifiable by name when loading a structure dynamically from the NIH Cactus database, you may occasionally come across a structure that the database does not know. For these situations, we suggest you try to find a SMILES (Simplified Molecular Input Line Entry Specification) sequence.
SMILES Sequences are a line notation for molecules that include connectivity between the specific atoms in a structure but do not include 2D or 3D coordinates. Atoms are represented by their element symbols (C, N, O, P, Cl, Br, etc.). The equals sign "=" represents double bonds and the pound sign "#" represents triple bonds. Branching is indicated by brackets "()" and rings are indicated by pairs of digits. A few examples are shown below.
Jmol can use a SMILES sequence and connect to the NIH Cactus database to turn it into a 3-dimensional structure.
To load the SMILES sequence for glucose:
Note that like loading a small molecule by name, you need to include the dollar sign "$" with no spaces between it and the name. This dollar sign tells Jmol that you want to access the NIH Cactus servers to convert the structure from a SMILES sequence to a 3-dimensional structure.
SMILES Sequences can be found from a variety of online drug and small molecule databases, including the following websites.
© Copyright 1995- - MSOE Center for BioMolecular Modeling