Laboratory 2: Working with complex datasets

In this lab, you will be working with the Hospital-Doctor-Patient simulated data set describes different attributes of lung cancer patients. The goal of the lab is to for you to become familiar with data frames and manipulation of data using vectorized operations.


Setup:  
Be sure to start a new project in a new directory. Execute the following to download the data. 

hdp <- read.csv("http://www.ats.ucla.edu/stat/data/hdp.csv")

Requirements:

Part 1:  Calculate the following items and output them neatly. Use vector indexing rather than loops whenever possible.
  • The total number of women and total number of men.
  • The number of married versus non married patients.
  • The median age of the men and the median age of the women.
  • The number of patients in the cohort had a family history of smoking.
  • The total number of lawsuits filed by this patient cohort.
Part 2: Plot and label the histograms of tumorsize, lungcapacity, and RBC.

Part 3:  Calculate the correlations among the following variables: tumorsize, lungcapacity, Age, WBC, and BMI. Interpret your results in a few sentences.

Part 4:  Test the hypothesis that on average, men have larger tumors than women do.


Handin:
Zip up your project directory and upload to Blackboard. Also save a knit file.