Before starting work on this lab, please look at the section titled “What to turn in” at the end. You will be required to turn in a few screenshots so that I can ensure that you have actually done this lab. Knowing ahead of time what to turn in will save you from having to do it again because you lost the web page with your results!
This lab will follow a series of steps that you might be expected to do if you were researching variants in a gene, except you will do it all in silico (the biological term for “by computer”). Specifically, we will obtain DNA sequence data from a database and use it to predict the exact location of a gene. We will then locate sites within the gene in which there are known variants in our population. We’ll then go through the process of designing PCR primers appropriate for amplifying a region of this gene, conducting the PCR, doing a restriction digest to distinguish between 2 alleles, and visualizing the DNA on a gel. The steps we’ll take are outlined in Fig 2.1.
Figure 2.1: Flow chart for PTC lab
Transmitting a signal from outside to inside our cells is often accomplished by a class of proteins called G-protein-coupled receptors (GPCRs). These molecules are transmembrane proteins that bind to particular ligands. When a ligand is bound by a GPCR on the outside of the cell, they set off a cascade of events within the cell. These events may involve inducing the cell to manufacture particular proteins or starting a nerve impulse (action potential) to the brain.
In the case of taste, humans have dozens of genes that code for GPCRs that specifically bind to bitter- tasting ligands in our food. One of these genes is called TAS2R38. Within this gene, there are 3 single nucleotide polymorphisms (SNPs) that are commonly found in the population and affect the ability to taste phenylthiocarbamide (PTC). Most people have one of two alleles, often referred to as PAV or AVI due to the amino acids coded for at these sites.
| Position (bp) | Position (AA) | Taster DNA | Taster AA | Non-Taster DNA | Non-Taster AA |
|---|---|---|---|---|---|
| 145 | 49 | C | Pro | G | Ala |
| 785 | 262 | C | Ala | T | Val |
| 886 | 296 | G | Val | A | Ile |
in this lab, we will download the sequences to the TAS2R38 gene and develop a molecular test to determine whether individuals are able to taste PTC.
The National Institute of Health (NIH) contains a division called the National Center for Biotechnology Information (NCBI). NCBI runs a massive database that, as of 2020, contains \(8.2 \times 10^{12}\) (8 trillion) DNA bases from all sorts of different organisms that have been submitted from researchers around the world. NCBI shares data with similar organizations in other countries and makes data freely available.
Figure 4.1: GenBank file layout
Once some DNA has been sequenced, researchers are interested in learning what it does. One of the first steps to doing this is to figure out exactly where different features of the gene are. For example, where does the coding part of the gene start and stop? What regulatory regions are there to control the expression of the gene?
A gene is DNA that gets transcribed into mRNA which then often gets translated into protein (See chapter 17 in Campbell). The process of translation takes 3 nucleotides at a time and uses them as a code to add a single amino acid onto the end of a polypeptide. At the front end of this sequence of triplets, there is a triplet that we call a ‘start’ codon (see Fig 5.1) because it tells the ribosome to start translating from that point on. At the other end, there are 3 possible stop codons which tell the ribosome to stop translating. In the table below, AUG, which codes for methionine (Met), is the start codon. Find the stop codons in the codon table (Fig 5.1)
Figure 5.1: Codon table. Note that this relates mRNA to amino acids.
In this section, we will try to find the start and stop codons and make sure they are in the same reading frame (a multiple of 3 bases apart). If there is a start codon, a good amount of nucleotides, then finally a stop codon, we call this an ‘open reading frame’ and it may signal that there is a gene in that region. Note that when translation starts, the start codon sets the reading frame for the mRNA. Therefore, in order for a stop codon to be read as a ‘stop’, it must be in the same reading frame as the start codon (Fig 5.2)
Figure 5.2: 6 reading frames. Start codons are green, stop codons are red. Note that frames 4,5, and 6 are on the reverse strand relative to reading frames 1, 2, and 3.
Random DNA sequence is expected to produce a stop codon about every 20 codons (though this depends heavily on the GC content of the genome as well as other factors). Therefore, researchers look for long runs after a ‘start’ and without a stop codon to signal that the sequence might be coding sequence. Since the code exists as triplets, the start and stop codons must be in the same reading frame. Many programs exist to help find such “Open Reading Frames” (ORFs), but we’ll use one at NCBI.
Figure 5.3: NCBI orffinder output.
Now that we have found the longest open reading frame for this gene, we can look at some of the variants that are known to exist in the human population.
Figure 6.1: Dialog for selecting variants
You have just enabled a data ‘track’ to show up on the viewer. This track contains known variants that are common in our human population. In particular, notice the common variant called “rs1726866”, if you hover your mouse over it and follow links, you’ll find more information about this variant.
The polymerase chain reaction (PCR) makes many, many copies of a short (~100bp - >1000bp) region of DNA.
In order to study a region of DNA, it is often useful to make many copies of that region. Kary Mullis discovered that he could use use short DNA sequences (“primers”; just like primers in DNA replication, except DNA rather than RNA) to anneal to a section of DNA and then use a polymerase from bacteria to synthesize new DNA.Figure 7.1: Polymerase Chain Reaction (PCR)
In order to carry out PCR, you would need to add into a small tube some genomic DNA, nucleotides, primers specifically designed for the section of DNA, and a thermostable polymerase. Most polymerases would denature at such high temperatures, but some have been isolated from bacteria living in hot springs. The most famous of these is the bacteria Thermus aquaticus, from which the enzyme Taq polymerase gets its name. There have been many refinements to the process of PCR in the past years, but the same basic principles apply.
Please view a complete animation of PCR here. In the animation, go through all the available cycles of PCR so that you understand that most molecules will be of exactly the same length, from primer to primer.
Next. We will design primers to amplify a section of the TAS2R38 gene.
Again, we can use a tool provided by NCBI called Primer-BLAST.
Figure 7.2: Parameters for Primer-BLAST
When your results are returned to you, you should see something like 7.3. There are several different primer pairs, but we will focus on the pair labeled “Primer 3”. Note that your numbering may be different. Hover your mouse over the arrows labeled on the track as “Primer 3”. This primer pair will amplify a region from 776 to 983. Note also the variant we have been interested in falls within this range (located at position 869).
Figure 7.3: Output from Primer-BLAST
Scroll down the results page to where this primer pair is located. If we were really going to do this PCR, we could order these two primers from a biological supply company and they would synthesize them with the exact sequence.
Primer pair 3
| Direction | Sequence (5’->3’) | Strand | Length | Start | Stop | … |
|---|---|---|---|---|---|---|
| Forward | CCAGAAACTCTCGTGACCCC | Plus | 20 | 776 | 795 | … |
| Reverse | CCTGAGATCAGGATGGCTGC | Minus | 20 | 983 | 964 | … |
After we have conducted our PCR, we should have “PCR products” – a very concentrated solution (albeit small – perhaps only 15 microliters) of that one section of DNA. We will now use restriction enzymes to cut the DNA.
The site of the variant we are interested in (site 785 on the mRNA; site 869 on the full gene) contains a sequence that can be cut with a restriction endonuclease (aka “restriction enzyme”). Restriction enzymes are present in bacteria and help the bacteria defend against virus attacks by cutting viral DNA at particular recognition sites that the bacterial genome does not have. One such enzyme is called Fnu4H1 and it makes a staggered cut across the DNA with this recognition sequence (8.1):
Figure 8.1: Fnu4H1 restriction site. N means that it will recognize any base at that position.
At the variant site we’ve been interested in, the taster allele is:
5’...TGTGCTGCCTT...3’
3’...ACACGACGGAA...5’
Can you find the Fnu4H1 restriction site? The non-taster allele is:
5’...TGTGTTGCCTT...3’
3’...ACACAACGGAA...5’
Does the same restriction site appear in this sequence? You can hopefully find it in the taster allele. Lets see if we can find it using NCBI.
Figure 8.2: Locations of Fnu4H1 restriction sites within PCR product
Figure 8.3: Locations of Fnu4H1 restriction sites within PCR product. You can download this as a pdf or powerpoint so it’s easy to fill out.
As you can see, within the range of the PCR product that we selected, the taster allele has two restriction sites and can therefore be cut by the restriction enzyme at those two locations and the other (nontaster) allele has only a single site and therefore will be cut just once. In other words, the mutation we are interested in happens to affect the restriction site, nullifying it for one of the alleles.
Restriction digests are easy to accomplish. We simply have to combine a very small amount of the appropriate restriction enzyme with our PCR product in a buffer and keep it at a specific temperature for 15-60 minutes.
Gel electrophoresis separates DNA by length (size) and allows us to view different sizes of fragments.
Once we have our PCR products and we’ve digested them with a restriction enzyme, we need to view our results. We do this using gel electrophoresis. A gel is a matrix of a sugar (usually agarose) through which we pull DNA using an electric current. Because of the phosphate groups that are in the DNA backbone, DNA is negatively charged and will migrate toward a positive terminal.
To make the gel, we heat up agarose and a buffer and then let it cool in a form. The form shapes the gel into a slab similiar to a slab of Jello. We form the gel with partial holes (wells) at one end, into which the DNA can be loaded. After pipetting a small amount of DNA into the well of the gel, we apply the current.
Gel electrophoresis separates DNA by size: the larger pieces have a difficult time moving through the gel whereas the smaller pieces move more easily.
Figure 9.1 shows DNA being loaded into a gel. It is a tedious process that requires practice with the pipette!
Figure 9.1: Loading DNA with a blue dye into an agarose gel
After the gel is run, the DNA is pulled toward one side. The smaller fragments will have moved further, and the larger ones will remain near the wells where they started. We can use a variety of techniques to visualize the DNA. In Fig 9.2, a chemical that glows under UV light was used. Notice that the smaller fragments have moved further (to the lower right in this picture).
Figure 9.2: Visualizing bands on an agarose gel. The bands light up under UV light due to the addition of a dye (ethidium bromide). You can see the wells in the upper left. The DNA has migrated toward the bottom right of the photo.
The results of our gel might look something like Fig 9.3. We can thus distinguish between the alleles of a taster and a non-taster. Both of these gel lanes are what homozygotes would look like. A heterozygote would have both alleles.
Figure 9.3: Results comparing a homozygous taster with a homozygous non-taster. A heterozygote would have all the bands.
To get credit for completing this lab, please go through all the material and make sure you understand it. Ask questions about it if you are unsure.
In order to make sure you have completed the lab, please take and turn in the following screenshots: