Lecture 16: Accessing databases

Outline:
  1. Large number of resources on the web --- how do we integrate information

ACNUC: http://doua.prabi.fr/databases/acnuc
RDAVIDWebService:
http://www.bioconductor.org/packages/release/bioc/vignettes/RDAVIDWebService/inst/doc/RDavidWS-vignette.pdf
http://www.bioconductor.org/packages/release/bioc/html/RDAVIDWebService.html



Accessing external databases using seqinr

Example 1: Using seqinr -- find out what's in the package
library(seqinr)
lseqinr()

Many external databases have rules about how many queries you can make per second. The ACNUC database is a system that integrates
data from a number of external databases and allows you to download information in a common format.

Example 2: Showing the genetic code
           tablecode()

Note:  There are many slightly different genetic codes depending on the species. The following gives the vertebrate mitochondrial code:
           tablecode(numcode=2)

Example 3: Getting properties of amino acids
      data(aaindex)

Example 4:  Finding out information that is available through ACNUC 
choosebank()
gbank <- choosebank(bank="genbank", infobank=T)
ls()
Note: Use closebank() to close the connection and open a different one

Example 5: After doing a multiple alignment of PTEN with bacteria on NCBI, I saved the multiple alignment in a file. The following
example shows how to read that file in and get the accession numbers out.
library(Biostrings)
library(stringr)
ptenMA <- readAAMultipleAlignment(filepath="PTENMultAlignBacteria.fa")
p <- rownames(ptenMA)
pg <- str_split(p, "\\|")
pAccession <- sapply(pg, "[[", 2)

Example 6: 

Example 6: Get a virtual query to Hominidae and then get a list of species for this query (PS). Finally retrieve all the species below (SD)
query("hominidae", "SP=Hominidae", virtual = T)
query("hsp", "PS hominidae", virtual = T)
hsp$nelem

query("SDexample", "SD hsp")
getName(SDexample)


Example 5: Queries are case insensitive and individual clauses are enclosed in double quotes

query("dengue", "\"sp=@virus@\" AND \"sp=@dengue@\" AND NOT \"k=partial\"", virtual=T)


Example 6:  Get nuclear DNA.

query("hsCDS", "sp=Homo sapiens AND t=cds AND o=nuclear AND NOT k=partial",
virtual = TRUE))

Query language:
  • Case insensitive
  • Wildcard: @
  • Logical operators: AND, OR, NOT
  • The virtual parameter allows queries to be created without being fetched.
ftp://ftp.ncbi.nih.gov/refseq/README

TP53 accession:  NP_000537.3    and PTEN has accession number: XP_006717989.1