Lecture 4: Some basic statistics in R

Goals:
  • Understand the basic structure of DNA and the translation-transcription process in order to begin understanding how to view genetic data.
  • Be able to do some very basic statistics and data analysis in R.
Outline:
  1. Structure of DNA and genes (with a visit to NCBI website)
  2. Continue to work on R (data structures and combining)
  3. Basic statistical indicators:  mean, median, maximum, minimum, standard deviation, median/mean absolute deviation
Resources:
       Elementary statistics with R:  http://www.r-tutor.com/elementary-statistics
       Summary of R statistics functions covered in class: R statistics

Biology:
  • Structure of the DNA: always read from the 5' end.  The left is ACTG and the right strand is read CAGT. DNA synthesis always goes 5' to 3'.
DNA chemical structure

  • Structure of a gene: 
Structure of a gene

  • Gene substructure (codons):
Mapping between codons and amino acids
  • Other important terms:
    • Allelle  -  a version of the gene on one or other of the chromosomes
    • Open reading frame - the region of the DNA between start
Exercise:  Look at Genbank for the TP53 gene (tumor protein 53 gene).

Example 1: A time series (airmiles)
  1. Plot as a time series
  2. Plot by pulling out the vectors
  3. Calculate the mean and median --- why are they so different
  4. Plot the histogram
  5. Plot boxplot
Example 2:  A time series (sunspot.month)
  1. Plot as a time series
  2. Calculate mean and median
  3. Plot boxplot
  4. Plot histogram
  5. Reshape into a two dimensional array
  6. Sum rows to get sunspots by year
  7. Sum columns to get sunspots by month
Example 3:  A data frame (faithful)
  1. Plot 
  2. Extract columns
  3. Calculate basic statistics