NCBI usage notes

Databases for genetic sequences (NCBI):
  • GenBank:
    •  contains most known nucleotide and protein sequences
    •  is archival
    • may have hundreds of versions of a gene
  • RefSeq best sequence for non-mutated transcript or protein product
  • EST (expressed sequence tag) 
    • single-pass cDNA sequences
    • usually 300 to 800 bps
    • Genbank dbEST database holds them.
  • STS (sequence-tagged site):
    •  short (200 to 500 base pair) DNA sequence 
    • a single occurrence in the genome
    • used as a marker for assembly and detecting deletions
    • the dbSTS database holds the STS
    • homo sapiens has 324K STSs 
  • UniGene (unique gene)
    •  clusters ESTs into non-redundant sets -- hopefully one per gene
    •  currentl 
  • Accession number -- unique -- usually a letter or two followed by numbers --- unique for particular sequence  KF572430.1 GI:557786680
    • NC_xxxxxx   DNA from complete genome
    • NM_xxxxx    DNA corresponding to mRNA
    • NG_xxxxx    DNA genomic reference
    • NW_xxxxx   DNA alternate assembly
    • NP_xxxxxx  protein
  • GI Number - GenBank Identifier
  • Gene ID -- applicable to genes and the Gene database
  • DNA
  • RNA
  • cDNA - complementary DNA (complement of an RNA strand)
Amino acids: