Laboratory 4: Discovery with microarrays

In this lab you will be exploring microarray data from NCBI GEO. You will use GEOmetadb to find microarray series that might reveal interesting information about your gene. You will then look for differential expression using GEO2R at NCBI.

Overview of the procedure:

You should create a word document summarizing the results of each step:

Part 1: Aliases - Define a variable in R that contains a list of aliases of the gene symbol for your gene (just for human).

Part 2: GEOmetadb - Use GEOmetadb in R find the GEO series (GSE) that meet the following critieria:
  • Have samples only from GPL96, GPL570, or GPL1261.
  • Have the word cancer (or its aliases defined in class) appearing in the series.title or series.summary (note series have summaries not descriptions).
  • Have at least 100 samples (note these may not all be from one of the required platforms.
Output the list of series accession numbers (i.e., gse) and the series titles that meet these criteria. Also write to a file.
 
Part 3: Narrowing down sets Use GEOmetadb in R find any GEO samples (GSM) that have any of the aliases for your gene appearing in the sample title, 
sample description,  series title, or series summary. Output the list including the sample accession, series accession, and sample title. Also output a list of the unique series and the number of samples in each series that meets this criteria. 

Part 4:  Select a GEO series to analyze.  The order of selection priority is:
  • If some series meets the criteria of both part 2 and part 3, then select it. Choose one that is also a GDS if there is more than one.
  • If you can't find a series, then pick a series from part 2 that appears in GEO datasets (GDS).
After you have narrowed down your candidate series to about 10 or so, you should look on NCBI GEO under series under the GEO2R and look for series that have some control variables that make sense.  Write a short justification of why you picked the  series that you did.

Part 5:  Use NCBI GEO set analysis tools to calculate the top 250 genes that are differentially expressed for some pair of conditions that in the series selected in Part 4.. Download the results (you can cut and paste into a file if necessary). Read the results into R. Extract a list of genes and determine whether or not your gene (or any of its aliases) is differentially expressed. Also determine whether any of the cancer genes (from the Lawrence paper and also listed in Example 2 of Lecture 19 in the Microarray gene profiles section) appear on the top 250 list.