1 Before you get started…

Before starting work on this lab, please look at the section titled “What to turn in” at the end. You will be required to turn in a few screenshots so that I can ensure that you have actually done this lab. Knowing ahead of time what to turn in will save you from having to do it again because you lost the web page with your results!

2 Overview

This lab will follow a series of steps that you might be expected to do if you were researching variants in a gene, except you will do it all in silico (the biological term for “by computer”). Specifically, we will obtain DNA sequence data from a database and use it to predict the exact location of a gene. We will then locate sites within the gene in which there are known variants in our population. We’ll then go through the process of designing PCR primers appropriate for amplifying a region of this gene, conducting the PCR, doing a restriction digest to distinguish between 2 alleles, and visualizing the DNA on a gel. The steps we’ll take are outlined in Fig 2.1.

Flow chart for PTC lab

Figure 2.1: Flow chart for PTC lab

3 Background

Transmitting a signal from outside to inside our cells is often accomplished by a class of proteins called G-protein-coupled receptors (GPCRs). These molecules are transmembrane proteins that bind to particular ligands. When a ligand is bound by a GPCR on the outside of the cell, they set off a cascade of events within the cell. These events may involve inducing the cell to manufacture particular proteins or starting a nerve impulse (action potential) to the brain.

In the case of taste, humans have dozens of genes that code for GPCRs that specifically bind to bitter- tasting ligands in our food. One of these genes is called TAS2R38. Within this gene, there are 3 single nucleotide polymorphisms (SNPs) that are commonly found in the population and affect the ability to taste phenylthiocarbamide (PTC). Most people have one of two alleles, often referred to as PAV or AVI due to the amino acids coded for at these sites.

Position (bp) Position (AA) Taster DNA Taster AA Non-Taster DNA Non-Taster AA
145 49 C Pro G Ala
785 262 C Ala T Val
886 296 G Val A Ile

in this lab, we will download the sequences to the TAS2R38 gene and develop a molecular test to determine whether individuals are able to taste PTC.

4 Download DNA sequences from NCBI website

The National Institute of Health (NIH) contains a division called the National Center for Biotechnology Information (NCBI). NCBI runs a massive database that, as of 2020, contains \(8.2 \times 10^{12}\) (8 trillion) DNA bases from all sorts of different organisms that have been submitted from researchers around the world. NCBI shares data with similar organizations in other countries and makes data freely available.

  1. Go to https://www.ncbi.nlm.nih.gov. It’s probably easiest if you right-click and ’open link in new tab” so that you can save the page you work with but still come back here.
  2. In the top search box, type in “TAS2R38” and select the “Gene” database. Click Search.
  3. The first search result (it looks like a ‘card’ above the rest of the search results) is from humans and is the one we want. Click on its title to go to the data for that gene.
  4. Go back in your browser so you can see the ‘card’ again. This time, click “RefSeq Transcripts” (within the ‘card’) to see information about the sequence itself.
  5. At the top of the result, you can see that the file you are being shown is a ‘GenBank’ file, there is a reference number (it is called an “accession number” and in this case begins with “NM_”). Also note that the gene is 1143 base pairs long (it’s very short compared to most eukaryotic genes).’
  6. Scroll down to the bottom of the page. There you will see the actual sequence of the gene.
GenBank file layout

Figure 4.1: GenBank file layout

5 Finding Open Reading Frames

Once some DNA has been sequenced, researchers are interested in learning what it does. One of the first steps to doing this is to figure out exactly where different features of the gene are. For example, where does the coding part of the gene start and stop? What regulatory regions are there to control the expression of the gene?

A gene is DNA that gets transcribed into mRNA which then often gets translated into protein (See chapter 17 in Campbell). The process of translation takes 3 nucleotides at a time and uses them as a code to add a single amino acid onto the end of a polypeptide. At the front end of this sequence of triplets, there is a triplet that we call a ‘start’ codon (see Fig 5.1) because it tells the ribosome to start translating from that point on. At the other end, there are 3 possible stop codons which tell the ribosome to stop translating. In the table below, AUG, which codes for methionine (Met), is the start codon. Find the stop codons in the codon table (Fig 5.1)

Codon table. Note that this relates mRNA to amino acids.

Figure 5.1: Codon table. Note that this relates mRNA to amino acids.

In this section, we will try to find the start and stop codons and make sure they are in the same reading frame (a multiple of 3 bases apart). If there is a start codon, a good amount of nucleotides, then finally a stop codon, we call this an ‘open reading frame’ and it may signal that there is a gene in that region. Note that when translation starts, the start codon sets the reading frame for the mRNA. Therefore, in order for a stop codon to be read as a ‘stop’, it must be in the same reading frame as the start codon (Fig 5.2)

6 reading frames. Start codons are green, stop codons are red. Note that frames 4,5, and 6 are on the reverse strand relative to reading frames 1, 2, and 3.

Figure 5.2: 6 reading frames. Start codons are green, stop codons are red. Note that frames 4,5, and 6 are on the reverse strand relative to reading frames 1, 2, and 3.

Random DNA sequence is expected to produce a stop codon about every 20 codons (though this depends heavily on the GC content of the genome as well as other factors). Therefore, researchers look for long runs after a ‘start’ and without a stop codon to signal that the sequence might be coding sequence. Since the code exists as triplets, the start and stop codons must be in the same reading frame. Many programs exist to help find such “Open Reading Frames” (ORFs), but we’ll use one at NCBI.

  1. Go to https://www.ncbi.nlm.nih.gov/orffinder/. Again, I suggest you open this in a new tab or window.
  2. Enter the accession number (NM_…) that you wrote down when you first found the TAS2R38 gene into the search box. If you had DNA sequence, you could also enter that.
  3. Leave the other parameters as their default values, and click ‘Submit’
NCBI orffinder output.

Figure 5.3: NCBI orffinder output.

  1. Take a screenshot of the results. You will turn this in.
  2. In the output (Fig 5.3), you should see a pictoral representation of the sequence that you submitted. Note that:
    1. it begins at base 1 and ends at 1143.
    2. in all, there are 10 ORFs.
    3. you can zoom in until you can see each nucleotide and each codon.
    4. you can get the nucleotide sequence or the protein sequence (in single-letter codes). Select the longest ORF (it is probably labeled ORF1) and choose to view the nucleotide sequence.
      1. What is the start codon?
      2. What stop codon is used?
    5. some ORFs (eg. ORF1) are on the ‘plus’ strand and some go in the other direction on the ‘minus’ strand. You can see this both in the table and as arrows on the map.
    6. select to see the six-frame translation track by clicking the link at the bottom-right of the main map. You will see a picture similar to Fig 5.2. Zoom in to the picture using the ‘magnifying glass’ icon on the map.
  3. Close the six-frame translation track window.
  4. In many cases, the longest ORF will be the one that codes for a protein. On the map of the ORFs, right click the long ORF and select to “Set Sequence Origin At Feature”, then accept the new origin of 85. This will just renumber the scale so that both are given.

6 Locate genetic variants

Now that we have found the longest open reading frame for this gene, we can look at some of the variants that are known to exist in the human population.

  1. At the top right of the viewer pane, find a gear button labeled “Tracks” and click on it. Then select “Configure Tracks”
  2. In the dialog box that pops up, select “Variation” on the left and then select “GMAF >= 0.01” on the main box then click “Configure”. See Fig 6.1.
  3. Click the magnifying grlass on the top left of the map, and enter “rs1726866” into the dialog box. Click ‘OK’.
Dialog for selecting variants

Figure 6.1: Dialog for selecting variants

You have just enabled a data ‘track’ to show up on the viewer. This track contains known variants that are common in our human population. In particular, notice the common variant called “rs1726866”, if you hover your mouse over it and follow links, you’ll find more information about this variant.

  1. Right-click the variant rs1726866 and select “Zoom to sequence”.
  2. You will see that the ORF has a codon “GCT” at that site and the variant converts the Cytosine to a Thymine (indicated on the track by “C/T”).
  3. You will also see that the codon GCT codes for Alanine (shown as an “A” right below the “GCT” on the ORF1 track; confirm this in the codon table (Fig 5.1), though you should realize that the DNA strand shown is the “coding” strand of DNA. The complementary strand to the “coding” strand is the “template” strand, and it is the template strand that produces the mRNA that is used in the codon table. Since the mRNA and the “coding” strand are both complementary to the same “template” strand, their sequences are the same except for the fact that RNA uses uracil instead of thymine.)
  4. Use the codon table to determine what amino acid would be produced in place of the alanine, for people who have the variant rs1726866.
  5. Note that this variant is at position 785 on ORF1. However, the gene that we accessed (NM_…) is larger because it has both upstream and downstream sequence that is part of the mRNA but is not translated. The translated portion is called the CDS (coding domain sequence). The variant we are interested in is at location 869 on the full gene. Note these numbers – they’ll be useful later.

7 Polymerase Chain Reaction

The polymerase chain reaction (PCR) makes many, many copies of a short (~100bp - >1000bp) region of DNA.

In order to study a region of DNA, it is often useful to make many copies of that region. Kary Mullis discovered that he could use use short DNA sequences (“primers”; just like primers in DNA replication, except DNA rather than RNA) to anneal to a section of DNA and then use a polymerase from bacteria to synthesize new DNA.
Polymerase Chain Reaction (PCR)

Figure 7.1: Polymerase Chain Reaction (PCR)

  1. Denaturation As can be seen in Fig 7.1, PCR starts by heating genomic DNA to 95\(^\circ\)C. This melts apart the 2 complementary strands of DNA by breaking the hydrogen bonds between A:T and G:C.
  2. Annealing Next, the temperature is cooled and primers complementary to particular sequences are allowed to anneal. These primer sequences are pointed toward each other on the complementary strands (see Fig 7.1).
  3. Extension The temperature is warmed up again to allow a polymerase enzyme to extend the DNA by adding nucleotides onto the 3’ end of the primer.
  4. Repeat This cycle of heating and cooling is repeated about 30 times. Each time, the amount of DNA is doubled. Although this seems really complicated, it is fully automated and can be completed in a bit over an hour in a small machine.

In order to carry out PCR, you would need to add into a small tube some genomic DNA, nucleotides, primers specifically designed for the section of DNA, and a thermostable polymerase. Most polymerases would denature at such high temperatures, but some have been isolated from bacteria living in hot springs. The most famous of these is the bacteria Thermus aquaticus, from which the enzyme Taq polymerase gets its name. There have been many refinements to the process of PCR in the past years, but the same basic principles apply.

Please view a complete animation of PCR here. In the animation, go through all the available cycles of PCR so that you understand that most molecules will be of exactly the same length, from primer to primer.

Next. We will design primers to amplify a section of the TAS2R38 gene.

7.1 Designing PCR primers

Again, we can use a tool provided by NCBI called Primer-BLAST.

Parameters for Primer-BLAST

Figure 7.2: Parameters for Primer-BLAST

  1. Enter the accession number (“NM_…”) that you looked up at the beginning of this exercise into the main search box of Primer-BLAST. (See Fig 7.2)
  2. Accept all the default values and then press “Get Primers” at the bottom. Your job may take from a few seconds to a few minutes to complete. Be patient, and the page will periodically update.
  3. Primer-BLAST will look for sequences that have particular characteristics such as
    1. an appropriate melting temperature
    2. the primers aren’t complementary to one another so they won’t ‘stick’ to each other during PCR
    3. the priming sequence doesn’t exist elsewhere in the genome, or, if it does, there isn’t a nearby site for the second primer to bind to.
  4. Primer-BLAST may ask you about your PCR template being highly similar to some DNA in its database. Select the accession for TAS2R38 and click “Submit”
  5. The results may take several minutes. Be patient.

When your results are returned to you, you should see something like 7.3. There are several different primer pairs, but we will focus on the pair labeled “Primer 3”. Note that your numbering may be different. Hover your mouse over the arrows labeled on the track as “Primer 3”. This primer pair will amplify a region from 776 to 983. Note also the variant we have been interested in falls within this range (located at position 869).

Output from Primer-BLAST

Figure 7.3: Output from Primer-BLAST

Scroll down the results page to where this primer pair is located. If we were really going to do this PCR, we could order these two primers from a biological supply company and they would synthesize them with the exact sequence.

Primer pair 3

Direction Sequence (5’->3’) Strand Length Start Stop
Forward CCAGAAACTCTCGTGACCCC Plus 20 776 795
Reverse CCTGAGATCAGGATGGCTGC Minus 20 983 964

8 Restriction digest

After we have conducted our PCR, we should have “PCR products” – a very concentrated solution (albeit small – perhaps only 15 microliters) of that one section of DNA. We will now use restriction enzymes to cut the DNA.

The site of the variant we are interested in (site 785 on the mRNA; site 869 on the full gene) contains a sequence that can be cut with a restriction endonuclease (aka “restriction enzyme”). Restriction enzymes are present in bacteria and help the bacteria defend against virus attacks by cutting viral DNA at particular recognition sites that the bacterial genome does not have. One such enzyme is called Fnu4H1 and it makes a staggered cut across the DNA with this recognition sequence (8.1):

Fnu4H1 restriction site. N means that it will recognize any base at that position.

Figure 8.1: Fnu4H1 restriction site. N means that it will recognize any base at that position.

At the variant site we’ve been interested in, the taster allele is:

5’...TGTGCTGCCTT...3’
3’...ACACGACGGAA...5’

Can you find the Fnu4H1 restriction site? The non-taster allele is:

5’...TGTGTTGCCTT...3’
3’...ACACAACGGAA...5’

Does the same restriction site appear in this sequence? You can hopefully find it in the taster allele. Lets see if we can find it using NCBI.

  1. Go back to your Primer-BLAST results.
  2. Enter the sequence “GCNGC” in the search box at the top of the graphical view region (Fig 7.3)
  3. In the results, click the “Sequence” tab. You should see a list of sites that match.
  4. Notice that there are 2 sites that are within the range of our PCR product (776-964). In fact, one of them is at 868-872, exactly the spot where we have our TAS2R38 variant.
  5. Save a screenshot. You’ll turn this one in as well. It will be similar to 8.2.
Locations of Fnu4H1 restriction sites within PCR product

Figure 8.2: Locations of Fnu4H1 restriction sites within PCR product

  1. On a piece of paper, sketch out the DNA that would result from your PCR. You can use Fig. 8.3 as a guide (download as pdf or as powerpoint). Label the following items (letters below correspond to letters on Fig. 8.3), making sure to go through this list very carefully:
    1. Label the left end 776 and the right end 964, as these are the coordinates of the end of the PCR product.
    2. Write the first 5 bases of the forward primer.
    3. Write the reverse complement to the forward primer. It is actually to this strand that the primer will bind.
    4. Write the first 5 bases of the reverse primer. NOTE You will have to write this in reverse. (it will be identical, though in reverse, to the lower strand when read in 5’->3’ direction)
    5. Write the reverse complement to the reverse primer.
    6. Write that the TAS2R38 variant is at base 869. This variant converts “GCTGC” (or “GCAGC” on the other strand) into “GTTGC”. Write this sequence (GCTGC) in the correct spot and label it as the ‘Taster’ allele.
    7. Write the reverse complement to this variant site on the other strand.
    8. Write the non-taster variant of TAS2R38 (GTTGC) on the non-taster diagram.
    9. Write the reverse complement to this variant.
    10. Locate the approximate position of each restriction site that you found in Fig. 8.2 and write its sequence. The diagram in Fig. 8.3 doesn’t include a space for these, so you’ll have to identify where they go. Write the sequence on the diagram.
    11. Indicate (yes or no) whether Fnu4H1 will cut the taster allele at the variant site 869.
    12. Indicate (yes or no) whether Fnu4H1 will cut the non-taster allele at variant site 869.
    13. Write down the number of times Fnu4H1 cuts the DNA of the taster allele. You will need to refer back to the results where you searched for “GCNGC” in the primer page.
    14. Write down how many fragments you would have after a digestion of the taster allele with Fnu4H1. A single cut would result in 2 fragments, 2 cuts would result in 3 fragments, etc.
    15. Indicate the sizes of the fragments. For example, the entire PCR product is 964 - 776 + 1 = 189bp. (we add 1 because both end points are present. Think how long a strand would be if it started at base 1 and ended at base 2. It would be 2 bases (2 - 1 + 1 = 2)).
    16. Write down how many times Fnu4H1 cuts the DNA of the non-taster allele.
    17. Write down how many fragments you would have after a digestion of the non-taster allele with Fnu4H1. Again, you will have to use the results from searching for ‘GCNGC’ in the primer search results.
    18. Indicate the sizes of all the fragments of the non-taster allele.
  2. Finally, check all the sequences you wrote against the sequence you downloaded in step 4. Notice that the sequence that you download is for the top, 5’->3’ strand in Fig. 8.3.
  3. Take a picture of this restriction map to turn in (or annotate the digital files).
Locations of Fnu4H1 restriction sites within PCR product. You can download this as a [pdf](https://senanu.github.io/b40/assets/labs/PTC/PTCRestrictionMap.pdf) or [powerpoint](https://senanu.github.io/b40/assets/labs/PTC/PTCRestrictionMap.pptx) so it's easy to fill out.

Figure 8.3: Locations of Fnu4H1 restriction sites within PCR product. You can download this as a pdf or powerpoint so it’s easy to fill out.

As you can see, within the range of the PCR product that we selected, the taster allele has two restriction sites and can therefore be cut by the restriction enzyme at those two locations and the other (nontaster) allele has only a single site and therefore will be cut just once. In other words, the mutation we are interested in happens to affect the restriction site, nullifying it for one of the alleles.

Restriction digests are easy to accomplish. We simply have to combine a very small amount of the appropriate restriction enzyme with our PCR product in a buffer and keep it at a specific temperature for 15-60 minutes.

9 Gel electrophoresis of PCR products

Gel electrophoresis separates DNA by length (size) and allows us to view different sizes of fragments.

Once we have our PCR products and we’ve digested them with a restriction enzyme, we need to view our results. We do this using gel electrophoresis. A gel is a matrix of a sugar (usually agarose) through which we pull DNA using an electric current. Because of the phosphate groups that are in the DNA backbone, DNA is negatively charged and will migrate toward a positive terminal.

To make the gel, we heat up agarose and a buffer and then let it cool in a form. The form shapes the gel into a slab similiar to a slab of Jello. We form the gel with partial holes (wells) at one end, into which the DNA can be loaded. After pipetting a small amount of DNA into the well of the gel, we apply the current.

Gel electrophoresis separates DNA by size: the larger pieces have a difficult time moving through the gel whereas the smaller pieces move more easily.

Figure 9.1 shows DNA being loaded into a gel. It is a tedious process that requires practice with the pipette!

Loading DNA with a blue dye into an agarose gel

Figure 9.1: Loading DNA with a blue dye into an agarose gel

After the gel is run, the DNA is pulled toward one side. The smaller fragments will have moved further, and the larger ones will remain near the wells where they started. We can use a variety of techniques to visualize the DNA. In Fig 9.2, a chemical that glows under UV light was used. Notice that the smaller fragments have moved further (to the lower right in this picture).

Visualizing bands on an agarose gel. The bands light up under UV light due to the addition of a dye (ethidium bromide). You can see the wells in the upper left. The DNA has migrated toward the bottom right of the photo.

Figure 9.2: Visualizing bands on an agarose gel. The bands light up under UV light due to the addition of a dye (ethidium bromide). You can see the wells in the upper left. The DNA has migrated toward the bottom right of the photo.

The results of our gel might look something like Fig 9.3. We can thus distinguish between the alleles of a taster and a non-taster. Both of these gel lanes are what homozygotes would look like. A heterozygote would have both alleles.

  1. Re-sketch the gel showing a lane for a heterozygote.
Results comparing  a homozygous taster with a homozygous non-taster. A heterozygote would have all the bands.

Figure 9.3: Results comparing a homozygous taster with a homozygous non-taster. A heterozygote would have all the bands.

10 What to turn in

To get credit for completing this lab, please go through all the material and make sure you understand it. Ask questions about it if you are unsure.

In order to make sure you have completed the lab, please take and turn in the following screenshots:

  1. The results from NCBI’s orffinder. This will be similar to Fig. 5.3.
  2. The results from searching for your restriction site in the PCR results. This will be similar to Fig. 8.2.
  3. A map of restriction sites of your PCR product, showing the sizes of both alleles. This will also include the primers used. This will be similar to (or derived from) Fig. 8.3.
  4. A sketch of a gel similar to Fig. 9.3 that includes the bands you’d expect for a heterozygote.