Thursday, May 14, 2009

Distributed Annotation System (DAS)

DASDAS is a web based protocol for exchanging genomic annotation data introduced by Lincoln D. Stein, Sean Eddy, and Robin Dowell in a 2001 paper in BMC Bioinformatics. The spec defines URL requests used by clients to query servers and XML documents served up in response. The data model implied by DAS XML is partially an XML-ified version of GFF (another Lincoln Stein project), modified to better fit the hierarchical structure of XML. Simplified, there are sequences, which have a start, stop and description and annotations (aka features). The DAS data type for sequences is designed to help deal with ongoing revisions to genome assemblies. The reference sequence is a hierarchical structure of fragments of genome called segments. Smaller segments of sequence are assembled into larger units such as contigs or whole chromosomes. This makes the annotation data more resilient to revised assembly of the genome, but leaves some of the responsibility of reducing the data to a common basis in the hands of the clients. Annotations have types, methods, and categories. Types correspond to feature type tags from EMBL and GenBank, for example "exon", "CDS" (for coding sequence), or "tRNA". A method is a laboratory procedure or computational method for discovering the feature. Categories seem vaguely defined to me. They list "Homology", "variation" and "transcribed" as examples. Annotations can be filtered by type and/or category. On the off chance that you're writing a genome browser, this might all come in handy. Links: