Lecture 8: Modeling relationships

Outline:
  1. Discuss covariance and correlation.
  2. Introduce the ideas of linear models
Resources:
Elementary statistics with R:  http://www.r-tutor.com/elementary-statistics
Summary of R statistics functions covered in classR statistics
Fitting and Interpreting Linear Models in R: http://blog.yhathq.com/posts/r-lm-summary.html
Introducing R: 4 Linear Models: http://data.princeton.edu/R/linearModels.html


Methods of showing relationships:
  1. Plot on the variables on same graph against a common independent variable (e.g., time). We can observe trends and other types of patterns. (Lab 1).
  2. Plot the variables against one another (e.g., a scatter plot) --- example female lung deaths against male lung deaths.
  3. Fit a model -- such as a linear model and look at the residuals and the R2 value.



Relationships between variables:

Vector variables x and y are linearly related if  yi = m * xi + b    (When plotted against each other as ordered pairs the points fall on a line.)

Covariance measures how linearly related two variables are:  cov(x, y)

Correlation is a normalized measure of how linearly related variables are: cor(x, y).  The values of correlation are between -1 and 1. A value of zero indicates no relationship.

Example 1: Calculate the covariance and correlation between male and female UK lung deaths (both the raw data and the monthly averages).

Example 2: Plot female versus male UK lung deaths (both raw data and monthly averages).

Example 3:  Construct a linear model that predicts female deaths given male deaths for a given month. Evaluate the quality of the model.

Example 4:  Calculate the correlation, and covariance for two random vectors of length 1000.  Plot the vectors against each other.

Example 5:  Calculate the correlation and covariance for two vectors x and y.  The vector x is a random vector of length x and the vector y is related to x by y = -10x + 2.

Example 6: Suppose  x is random and y is related to x by y = -10 x + 2 + eps.  Here eps represents random noise of a specified level.  How would you expect the correlation to vary as eps varies from 0 to 10?  

Example 7 Model the situation posed by Example 6.  Assume the noise is Guassian (normally distributed).  Plot the correlation versus eps for different values.



Before next time:
  • Update R Studio
  • Update the packages: in RStudio do Tools ->Check for updates to installed packages
  • Install MikTex: this is done outside of RStudio from website: http://miktex.org/2.9/setup