Labororatory 4: Discovery with microarraysGoals for today:- Understand linear models and errors.
- Become more familiar with microarray data and the ideas of differential expression
Lab 4 example solution: https://googledrive.com/host/0B2dQ2-mnbQvALUsyTWlfbnZZNjQ/Lab4Script.html (Script)There are many ways to accomplish a particular activity. For example, getting a count of the unique series and their samples. The following are three solutions (given by class members) are better than my solution.
Solution 1: Use aggregate
Solution 4: Do the calculation manually with a hash table This approach is not as good as the other approaches, but it illustrates how to use a hash table to do the calculation efficiently. This information is useful if the provided routines don't quite fit what you need.
R limma packageThe limma package does linear models explicitly for microarrays: lmFit(object, design)
- The design matrix has rows corresponding to the microarrays and columns corresponding to coefficients of the model.
- The object can be an array or an ExpressionSet -- but should have log ratios or log values of expressions)
Simple linear regression:
In R, we would write a model y ~ x. The intercept is implied. If we wanted no intercept we would write y ~ x - 1 or y ~ x + 0. Example 1: The trees data set has columns Volume, Height and Girth. How well does Girth predict Volume?
The output of the summary is:
The predicted line is Volume ~ 5.0659*Girth - 36.9535Example 2: Plot the data and the predicted line on the same graph
Example 3: Look at the errors to make sure that somewhat uniform across the variables
Example 4: Evaluate whether log(Volume) ~ log(Girth) is a better model Example 5: Evaluate whether log(Volume) ~ log(Girth) + log(Height) is a better modelExample 6: ANOVA = Analysis of variance looks at the SS_total (sum squares) ANOVA Columns are: - Df = degrees of freedom - number of independent pieces of information available to estimate a parameter
- Sum sq = sum of the square of the variables (SS)
- Mean sq = SS/DF for that row
The F values are determined by the ratio the MeanSq of the variable divided by the MeanSq of the error. This has the F distribution. The p value gives how likely this ratio would occur at random (no effect). The partition of the Sum of squares error is derived in http://en.wikipedia.org/wiki/Partition_of_sums_of_squares Example 7: Suppose x = [1, 2, 3, 4] and y = [1, 3, 3, 4.2]. Find the linear model and the ESS (explained sum of squared errors).The Limma package has its only modeling and design. |