Sequence database

Search

Sequence databases can be searched using a variety of methods. The most common usage is probably searching for sequences similar to a certain target protein or gene whose sequence is already known to the user. The BLAST program is a popular method of this type.

Current issues

Records in sequence databases are deposited from a wide range of sources, from individual researchers to large genome sequencing centers. As a result, the sequences themselves, and especially the biological annotations attached to these sequences, may vary in quality. There is much redundancy, as multiple labs may submit numerous sequences that are identical, or nearly identical, to others in the databases.^[2]

Many annotations of the sequences are based not on laboratory experiments, but on the results of sequence similarity searches for previously-annotated sequences. Once a sequence has been annotated based on similarity to others, and itself deposited in the database, it can also become the basis for future annotations. This can lead to a transitive annotation problem because there may be several such annotation transfers by sequence similarity between a particular database record and actual wet lab experimental information.^[3] Therefore, care must be taken when interpreting the annotation data from sequence databases.

References

↑ Cochrane, G.; Karsch-Mizrachi, I.; Nakamura, Y. (23 November 2010). "The International Nucleotide Sequence Database Collaboration". Nucleic Acids Research. 39 (Database): D15–D18. doi:10.1093/nar/gkq1150. |access-date= requires |url= (help)
↑ Sikic, K.; Carugo, O. (2010). "Protein sequence redundancy reduction: comparison of various method". Bioinformation. 5 (6): 234–9. doi:10.6026/97320630005234. PMID 21364823.
↑ Iliopoulos, I.; Tsoka, S.; Andrade, MA.; Enright, AJ.; Carroll, M.; Poullet, P.; Promponas, V.; Liakopoulos, T.; et al. (Apr 2003). "Evaluation of annotation strategies using an entire genome sequence". Bioinformatics. 19 (6): 717–26. doi:10.1093/bioinformatics/btg077. PMID 12691983.

External links

European Bioinformatics Institute databases
NCBI completely sequenced genomes
Stanford Saccharomyces Genome Database
Protein, the NIH protein database, a collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB

Bioinformatics

Databases	Sequence databases: GenBank, European Nucleotide Archive and DNA Data Bank of Japan Secondary databases: UniProt, database of protein sequences grouping together Swiss-Prot, TrEMBL and Protein Information Resource Other databases: Protein Data Bank, Ensembl and InterPro Specialised genomic databases: BOLD, Saccharomyces Genome Database, FlyBase, VectorBase, WormBase, PHI-base, Arabidopsis Information Resource and Zebrafish Information Network

Software	BLAST Bowtie Clustal HMMER MUSCLE SAMtools TopHat

Other	Server: ExPASy Ontology: Gene Ontology

Institutions	European Bioinformatics Institute US National Center for Biotechnology Information Swiss Institute of Bioinformatics Japanese Institute of Genetics Broad Institute Wellcome Trust Sanger Institute

Meetings	Intelligent Systems for Molecular Biology (ISMB) Research in Computational Molecular Biology (RECOMB) European Conference on Computational Biology (ECCB) Pacific Symposium on Biocomputing (PSB) ISCB Africa ASBCB Conference on Bioinformatics Basel Computational Biology Conference‎ ([BC²])

Computational biology List of biological databases Sequencing Sequence database Sequence alignment Molecular phylogenetics

This article is issued from Wikipedia - version of the 7/11/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Sequence database

Search

Current issues

See also

References

External links