title: “Qualitative analysis of Protein” output: html_document —
Before starting, go to the tools menu, choose ‘Global options’, then in the ‘R markdown’ menu, UNCHECK the box labeled “Show output inline for all R Markdown documents”. This will tell RStudio to put the plot into the viewer sub-window.
Get your data for the standard curve into R using a Google Sheets
spreadsheet or any other method that you prefer. You’ll need to
remember how to save your data from Google as a .csv
file. You can use the read.csv() function to get
it into R.
Your data sheet should have 1 row of headers with any data below those headers. To make your life easy, format your headers in a way that R can read them easily – do not use spaces or other weird characters (computer languages usually interpret spaces as a break between words). Dots (.) and underscores (_) are fine, but parentheses are not. Therefore, don’t try to incorporate units into your headings. Headings such as “Wavelength (nm)” will give you problems!
Make sure your data columns that should contain numeric values only contain numeric values throughout the entire column (except for the header row). In particular, don’t include the units alongside the data. Any ‘text characters’ will cause R to interpret the data as text, rather than numbers, and you can’t add text together in the same way you do with numbers.
Note that you ONLY need the data for the standard curve. Don’t include any of the milk data in your Google spreadsheet.
dat.str(dat)) function. Check that the variables that should
be numeric are labeled either as ‘int’ (integers) or ‘num’ (numeric).
Note what R calls your variables – you’ll need this information for
plotting the curve below.eval=FALSE to eval=TRUE once
your code runs. This is necessary to knit properly.knitr::opts_knit$set(global.device = TRUE)
# Do the steps above on the lines below this one.
dat <-
str(dat)
Assuming you have your data in a variable called dat
which has 2 series named “x.data.series” and “y.data.series” (choose
something better for your own data!), you can plot a graph with the
plot() command. You’ll need to modify it for your data, and
to adjust the labels to something useful. MAKE SURE TO MODIFY
“x.data.series” ETC. TO REFLECT THE NAMES AT THE HEAD OF YOUR DATA
TABLE!!!
This is a good chance to learn about providing arguments to
an R function. Type ?plot in the console (bottom left pane)
to view the help file. Notice it says that the usage is
plot(x, y, ...) and then it gives information about what
the x, y, and other arguments are. There are 2
ways to supply arguments in R. In the best way, you tell R explicitly
that x=dat$variable1 and y=dat$variable2 (see
the code chunk below). If you want to use a shortcut, you can just say
plot(variable1, variable2) and R will interpret this as the
first argument is the first argument in the help file, and the second is
the second argument in the help file. However, the first method is
generally better as it is very explicit.
?plot to see the help file for the plot()
function.plot() command below to accept your data
series. Note that data series are referred to by the $
operator.xlim and ylim
arguments, change the second number (initially set to 1) to accommodate
the highest values from the spectrophotometer (ylim) or
from the graph you drew by hand (for the x-axis;
xlim).eval=FALSE to eval=TRUE once
your code runs. This is necessary to knit properly.plot(x=dat$x.data.series,
y=dat$y.data.series,
main="Main title",
xlab="X axis label",
ylab="Y axis label",
ylim=c(0,1),
xlim=c(0,2))
recordedPlot <- recordPlot() # Don't touch this line
dat refers to the data table that you
saved in the code chunk above, and x.data.series and
y.data.series refer to the column names. Change them as
appropriate for your data.Calculate a linear regression of your data using a “linear model”
(lm()). The tilde (~) indicates to R a formula that
basically reads “is explained by”. Therefore, the following command
means “conduct a linear regression in which ‘y.data.series’ is
explained by ‘x.data.series’, and save the result in a variable
called”lin.regression”. Be careful to note which way around this
formula is: y ~ x, not x ~ y
eval=FALSE to eval=TRUE.lin.regression <- lm(dat$y.data.series ~ dat$x.data.series)
Remember from math class that a straight line has a formula of \[Y = mX + b\] Where ‘m’ is the slope and
‘b’ is the intercept (this is how I learned the equation – you might
have used different letters – it’s a regional thing!). You can get those
values by simply typing lin.regression on a line of its own
in the ‘Console’ (bottom left corner of the RStudio window) For
example:
##
## Call:
## lm(formula = professors_data$y ~ professors_data$x)
##
## Coefficients:
## (Intercept) professors_data$x
## -0.1688 0.5547
You can see the two coefficients:
You can now use the data from the regression to write down the equation for a regression line through your own data. In the example above, it would be \[Y = mX + b\] \[Y = 0.5547 X - 0.1688 \]
For your own data:
eval=FALSE to
eval=TRUE.lin.regression
# Type your regression equation with values here
#
#
#
Use the formula for a straight line that you wrote above to calculate the protein concentration of your milk samples. In this case, the Y will be the absorbance and the X will be the (unknown) protein concentration. You will therefore need to solve for X. Do this for each of the measurements you took – you probably have 5 for the skim milk and 5 for the whole milk.
When you have done that, enter the data below by hand. Each of the 4 variables below should look something like the following in which each value should be separated by a comma:
skim.abs <- c(0.123, 0.133, 0.142, 0.121, 0.125)
skim.conc <- c(1.234, 1.235, 1.324, 1.442, 1.421)
where the 0.123 (skim.abs) is the absorbance (Y value)
you measured and the 1.234 (skim.conc) is the corresponding
(X) value that you calculated using the equation for the straight line
(the concentration of protein in skim milk). Make sure that you keep
them in order so that the first value for skim.abs
corresponds to the first value of skim.conc.
eval=FALSE
to eval=TRUE.skim.abs <- c()
skim.conc <- c()
whole.abs <- c()
whole.conc <- c()
You will need to figure out how to do the following simple steps yourself. Feel free to get help from your partners, or if necessary, from your instructor.
mean() function to calculate the average
protein concentrations in both the skim and whole milk. Remember that
you can use an entire variable as the argument to the
mean() function.eval=FALSE
to eval=TRUE.average.whole <-
average.skim <-
undiluted.whole <-
undiluted.skim <-
Write these values down. You should include these in your lab writeup
eval=FALSE to
eval=TRUEreplayPlot(recordedPlot) # leave this line alone
abline(lin.regression)
recordedPlot <- recordPlot() # leave this line alone
Here, we will add points (of different colors) to the plot. Make sure you know which one is which color so you can describe it in the caption for this figure in your lab report.
eval=FALSE to
eval=TRUEreplayPlot(recordedPlot) # leave this line alone
points(x=skim.conc, y=skim.abs, col='blue', pch=16)
points(x=whole.conc, y=whole.abs, col='red', pch=16)
When you look at your graph, you should have blue and red dots, and they should be on the regression line. There are a few reasons why you may see fewer than 10 dots (5 red and 5 blue):
You didn’t make 10 measurements, perhaps because you ran out of time. You can’t do anything to rectify this at this point.
Some of your dots are overlapping. All dots are on the graph, but there are some you cannot see because they are underneath others.
Some of the dots don’t fit within the boundaries of the graph.
Check your data in the “Calculate Protein Concentration” section above
to see if this applies to your graph. If this is the case, go back to
the beginning of this Rmd file to the section (Plot a
standard curve) where you initially plotted the graph. Change limits on
the x-axis by changing the xlim = c(0,2) so that the
2 is large enough to accommodate your largest x-value.
Similarly, change ylim = c(0,1) so that the 1
is large enough for your largest y-value. After you make these changes,
run all the subsequent code so that your changes take effect on
what you did later. Check again that all 10 dots are accounted
for.
If your colored dots are not on the regression line, it is likely due to 1 of 2 reasons:
You miscalculated some of the values of X from your Y values. This will usually result in one or a few dots being off the line.
If your dots appear in a straight line, but not the regression
line, the cause is likely to be that you mixed up the X and Y in the
regression equation. You should have lin.regression = Y ~ X
using your variables for X and Y. It should not be
lin.regression = X ~ Y. The order of X and Y matter here!
If you made this mistake, make sure you run all subsequent code again
after fixing it.
Export and then
Copy to Clipboard. Then paste it into your lab report. If
you don’t have a graph, you may need to follow the instructions at the
very beginning of this document.Knit then
Knit to html. Save a copy of the resulting HTML file on
your computer. You won’t be turning this in, but it may be useful to
refer to.