Lecture 7: Variable relationships

Goals:
  • Become comfortable with development in R.
  • Understand how to use vectorized operations to solve problems.
  • Start to look at relationships between variables.

Outline:
  1. Introduce documentation facilities in R (using spin and knitr).
  2. Look at data frames, factors, and logical indexing for operations.
  3. Start to look at relationships between variables and modeling
Resources:
Elementary statistics with R:  http://www.r-tutor.com/elementary-statistics
Summary of R statistics functions covered in classR statistics

Knitr home page: http://yihui.name/knitr/
Creating Notebooks from R Scripts: http://www.rstudio.com/ide/docs/authoring/markdown_notebooks


Some R preliminaries:
  • Install knitr for document layout
install.packages('knitr', dependencies = TRUE)
  • Use knitr
library('knitr')

Once you do this you should see a Knit html button at the top of your script window. This allows you to produce nicely documented html and pdfs.

  • Install ggplot2 for better graphics
install.packages('ggplot2', dependencies=TRUE)

  • You should set your R Studio so that the package viewer is visible. That way you can see what is currently installed.

Example 1:  Turn your script into an html file by first creating a markdown version using spin.  Then knit the markdown to create html. Suppose the
script in the current directory is called  lab1.R. Just type:

spin('lab1.R')

This creates a markdown script Rmd.  Then you knit the markdown script to produce an html file. (Relevant markup for the R script includes:
  • commentary:  #'
  • chunk control: #+
  • inline code:  {{ }}

NOTE: For a direct translation with no markdown translation you can select Compile Notebook from the File menu of RStudio when you are editing a .R script.

Example 2:  Look at the Rmd script from Lecture 6.

Using factors and vector logic:

Example 3: A data frame with factor data:  data(Loblolly)
  • Look at the data in the View
  • Read the help for the data
  • Examine the data attributes
Example 4: Pick out the entries of Loblolly corresponding to Seed type 305 and plot.

Example 5: Pick out the entries of Loblolly which have a tree age of at least 10.

Example 6: The example for Loblolly uses the plot.formula form of plotting with subsetting




Relationships between variables:

Vector variables x and y are linearly related if  yi = m * xi + b    (When plotted against each other as ordered pairs the points fall on a line.)

Covariance measures how linearly related two variables are:  cov(x, y)

Correlation is a normalized measure of how linearly related variables are: cor(x, y).  The values of correlation are between -1 and 1. A value of zero indicates no relationship.