Lecture 11: Sequence alignment in R

Outline:
  • Setup R for working with bioinformatics data
  • Assignment of your "gene"
  • Work with sequence alignments in R
  • Do a few examples in ggplot2
Resources:
       Sequences, Genomes, and Genes in R/ Bioconductor: (General overview of packages available)
http://www.ebi.ac.uk/training/sites/ebi.ac.uk.training/files/materials/2013/131021_HTS/genesandgenomes.pdf
       Pairwise Sequence Alignments (Biostrings):  
       Downloading multiple sequences from GenBank Quickly and easily using APE in R
       Ape homepage and reference manuals:
       Biostrings reference manual:
       Seqinr documentation
       Seqinr user manual:
       ACNUC query language:  
       A little book of R for bioinformatics:


Installation:   (We'll be using these packages over the next few weeks, so it would be good to get the installation over with.)
  • Install biocLite from Bioconductor (repository for R bioinformatics packages):
source("http://bioconductor.org/biocLite.R")
require(BiocInstaller)
bioLite()
  • Install Biobase:
biocLite("Biobase")
  • Install Biostrings:   Package for manipulating strings (especially DNA) in R
biocLite("Biostrings")
require(Biostrings)
browseVignettes("Biostrings")
  • Install seqinr
install.packages("seqinr", dependencies=TRUE)
  • Install rtracklayer
biocLite("rtracklayer")
  • install ape:  Package for analysis of phylogenetics and evolution
install.packages("ape", dependencies=TRUE)
  • Install ggplot2 (grammar graphics)
install.packages("ggplot2", dependencies=TRUE)

Assignment of your Gene:
  • Each person in the class will be responsible for a particular cancer gene.
  • You will use data from this gene in your labs.
  • Take a look at http://www.tumorportal.org
  • Download the .maf file for your gene.  Example:  TP53.maf

Alignment algorithms in R (Biostrings)

Initial Examples:  Work through some of the pairwise sequence alignment handout from Biostrings.

Example 1:  Read FASTA format. Download the following two files:
tp53h <- readDNAStringSet("tp53humanrefseq.fasta")
tp53m <- readDNAStringSet("trp53mouserefseq.fasta")

Example 2:  Do an alignment using pairwiseAlignment

 


Getting started with ggplot2

Example 1:   Use the ggplot2 library with the diamonds dataset

library('ggplot2')
data(diamonds)