Lecture 19: Understanding pathways

Outline:
  • Introduce the ideas of pathways
  • Find groups of genes in a pathway
  • Go over how to cluster gene profiles from microarray data in R
  • See whether the clusters overlap with the genes in the pathway

Setup in R:
source("http://bioconductor.org/biocLite.R")
biocLite("GEOquery")
biocLite("Biobase")
biocLite("limma")


Introduction to pathways:  A pathway is "series of actions among molecules in a cell that leads to a change in the cell" (Wikipedia). Pathways give us a picture of how things "work" (the cell in action). Pathways are often represented by a  heterogeneous graph (i.e. nodes are the objects and edges represent reactions/relationships).

The fas signal transduction pathway
Sci. STKE 2007 (380). Molecular Animation of Cell Death Mediated by the Fas Pathway

The KEGG pathway that illustrates this is APOPTOSIS -- Cell Death --- http://www.genome.jp/kegg-bin/show_pathway?map04210

KEGG Notation (see http://www.genome.jp/kegg/document/help_pathway.html for complete legend)
  • Objects 
    • Gene/protein/RNA rectangle
    • Pathway or other map - roundedrectange
    • Molecule - small circle (?)
  • Relationships (in KGML)
    • Activation/Expression -->
    • Disocciation -+-
    • Indirect effect ..>
    • Phosphorylation +p
    • Inhibition --|

Microarry gene profiles

Example 1: Find a useful platform using GEO datasets:  
  • Search GEO datasets: I searched for my gene name PTEN in all fields on GEO datasets and then looked for a likely candidate microarray series that came from one of the platforms (GPL570 and GPL96).  GSE35896:  Gene expression data from 62 colorectal cancers was the first dataset on the list. The PTEN mutation was in one characteristics. 
  • Find series in GEO series: I went to GEO series and typed in the series number GSE35896 to bring up the record in GEO Series
  • Find differential expression using GEO2R: Click the GEO2R link and then calculate differential expression.
  • Save the expression file for analysis: Copy the table into the paste buffer (highlight and Ctrl-C) then paste it into an empty WordPad document and save it as a text document.
  • Analyze the result: Read into R as a data.frame with a tab separator
Note: You can download the result of this process at https://googledrive.com/host/0B2dQ2-mnbQvALUsyTWlfbnZZNjQ/GSE35896Top250DifExp.txt.

Example 2: Determine whether there is any overlap in the differentially expressed genes and the cancer genes listed in the Lawrence 2014 Nature paper:
cancerGenes <- c("TP53", "PIK3CA", "PTEN", "RB1", "KRAS", "NRAS", "BRAF", 
                 "CDKN2A", "FBXW7", "ARID1A", "MLL2", "STAG2", "ATM", "CASP8", "CTCF",
                 "ERBB3", "HLA-A", "HRAS", "IDH1", "NF1", "NFE2L2", "PIK3R1")
cancerExp <- intersect(top250, cancerGenes)

Here top250 is a character array containing the top 250 differentially expressed genes from Example 1:


Next steps in the search for mechanisms:   
  • Find genes that are highly expressed and whether genes in the same pathways as these genes are in our list.
  • Find genes in the pathways of the cancer genes and find out whether any of the interacting genes are in the differentially expressed list.
  • Cluster the entire series by gene profiles and see what groups of genes result and whether these overlap at all.

Doing differential expression in R (using limma):
  • Easiest way is to start with the template code generated from GEO2R
  • In R we can use GEOQuery to download the datasets that we need.
Example 1: Download a particular series from GEO
gset <- getGEO("GSE35896", GSEMatrix =TRUE)

Example 2: Extracting an array of expression values 
ex <- exprs(gset)

Note: This gives an array of microarray probes x samples. The correspondence between probes and genes is not one-to-one. Some of the genes have several different probes. We need to translate probes to gene symbols by accessing the platform information.

Example 3: Get the platform information for Affymetrix microarray platform GPL570
platf <- getGEO("GPL570", AnnotGPL=TRUE)
ncbifd <- data.frame(attr(dataTable(platf), "table"))

Example 4: Look at the columns that are available in the platform file:
> names(ncbifd)
 [1] "ID"                    "Gene.title"            "Gene.symbol"          
 [4] "Gene.ID"               "UniGene.title"         "UniGene.symbol"       
 [7] "UniGene.ID"            "Nucleotide.Title"      "GI"                   
[10] "GenBank.Accession"     "Platform_CLONEID"      "Platform_ORF"         
[13] "Platform_SPOTID"       "Chromosome.location"   "Chromosome.annotation"
[16] "GO.Function"           "GO.Process"            "GO.Component"         
[19] "GO.Function.ID"        "GO.Process.ID"         "GO.Component.ID"   

Example 5: Data frame joins (see http://stackoverflow.com/questions/1299871/how-to-join-data-frames-in-r-inner-outer-left-right). 

Some useful R things:

Example 1: Defining a function to do useful items (this function might be useful for Lab 4):
sqlList <- function(aList, aField) {
  # alist is a list of aliases
  # afield is a database column that you wish to search for items
  
  lSize <- length(aList)
  for (k in 1:(lSize-1)) {
    aList[[k]] <- paste(aField, " LIKE '%", aList[[k]], "%' OR ", sep="")
  }
  aList[[lSize]] <- paste(aField, " LIKE '%", aList[[lSize]], "%'", sep="")

  sqlList <- paste(aList, collapse="")
 }              
Note:  Must source the file containing this function, or embed directly in the script.