ProServer Guide

This document serves as a reference document and developers' guide to implementing a custom DAS data source. It is provided in conjunction with a series of tutorials [1] [2] [3], which are intended to get you up and running quicker.

Contents

  1. Server Features
    1. Multitasking
    2. Compression
    3. Flexible Deployment
    4. XSL Stylesheets
    5. DAS Extensions
  2. Installing ProServer
    1. Downloading
    2. Building
    3. Running
  3. Designing a DAS Source
    1. Co-ordinate Systems
    2. Services
    3. Intended Usage
    4. Data Storage
  4. Implementing a DAS Source
    1. Code Structure
    2. INI Format
    3. Transports
    4. Hydras
    5. Command methods
    6. Other methods
    7. Stylesheets
    8. XSL Stylesheets
    9. Homepages
    10. Metadata
    11. Registration
    12. Examples
  5. Updating to DAS/1.53E
  6. Further Information

Server Features

One of the strengths of the Distributed Annotation System (DAS) is its 'dumb server, clever client' architecture, which in theory allows even research groups with limited informatics resources to provide distributed access to their data. ProServer attempts to realise this strength by providing a framework for hosting data via DAS that is:

However, ProServer also offers some features beyond the core DAS specification that allow it to be used in a complex high performance environment.

Multitasking

ProServer is a standalone forking HTTP server based upon the Perl Object Environment (POE), a framework for event-driven multitasking applications in Perl. Using this framework, the server distributes concurrent requests between several child instances. The maximum number of processes ProServer uses (and therefore the maximum number of simultaneous client requests) is configurable, allowing a balance to be struck between resource usage and performance.

Compression

Where supported by clients, ProServer will reduce the size of lengthy responses using GNU Zip (Gzip). Clients wishing to take advantage of this support should set the standard 'Accept-Encoding: gzip' HTTP header (most web browsers do this).

Flexible Deployment

Being a standalone program, ProServer can be quickly deployed on any machine, and does not rely on an HTTP server such as Apache. It is also flexible enough to be integrated into existing webserver architectures. Both the Sanger Institute and European Bioinformatics Institute use ProServer in a load balancing, reverse proxied, clustered configuration.

XSL Stylesheets

The Extensible Stylesheet Language (XSL) defines how XML documents may be transformed into other formats. ProServer provides XSL Transformation (XSLT) stylesheets which clients such as web browsers use to present XML data in formats more amenable to human consumption (rather than for computer consumption, for which XML was principally designed).

You may consider modifying or adding to these default stylesheets if you wish to:

Currently, ProServer supplies XSLT stylesheets for the features, sources and dsn commands. Stylesheets for other commands are under development.

DAS Extensions

ProServer is a server implementation of the DAS/1.53E specification. This is an extended version of the current 1.53 DAS version. The 1.53 specification is published on the BioDAS website, and the 1.53E extensions are published at the DAS Registry.

In brief, the 1.53E extensions comprise:


Installing ProServer

The ProServer code is hosted at SourceForge, and periodic releases are also available for download from CPAN.

Downloading

Check out the code from SourceForge's Subversion server:

svn checkout https://proserver.svn.sourceforge.net/svnroot/proserver/trunk Bio-Das-Proserver

ProServer is also distributed as a package via CPAN. It is available for download at the website or using the CPAN command-line utility.

Building

ProServer is designed to be automatically built using the Module::Build package:

cd Bio-Das-ProServer
perl Build.PL
./Build
./Build test

You may receive warnings about missing dependencies. Only some of these are absolutely required for the core server - see the README file for details of these.

You may optionally install the ProServer modules into your Perl distribution:

./Build install

ProServer has a small number of standard dependencies (many of which you should already have). Any missing dependencies will be reported, in which case you should install them from CPAN. See the included README file for a list of required modules.

If any of your DAS sources will connect to a database, you will also need the DBI module and relevant driver (e.g. DBD::mysql or DBD::Oracle).

Running

ProServer is distributed with an example command-line executable in the eg directory:

eg/proserver --help

There is a default proserver.ini file containing details of the configuration options the server understands. Also see the INI Format section for details of configuring DAS sources.

An example CGI script is also provided in the eg directory.


Designing a DAS Source

The first step in exposing your data using DAS is determining how your data is to be offered. Consider some of these points when designing your DAS source to maximise its accessibility and usefulness.

Co-ordinate Systems

Is your data based on genomic co-ordinates, Ensembl Gene IDs, or perhaps proprietary identifiers? Are your reference sequences from the latest assembly?

If possible, it is best to expose your data on the most recent version of an assembly/database. However, if you have conducted some form of sequence analysis on an old or modified version of a sequence, you may need to define your own co-ordinate system.

Services

You probably want to expose features of some form, but could you also define a stylesheet to govern the display in clients such as Ensembl or SPICE? If your reference co-ordinate system is unique or not widely used you should provide a reference source offering sequences and entry points, but could you offer mapping alignments with segments from another co-ordinate system?

It is also very useful for your DAS source to be capable of informing clients of the valid entry points your source can annotate, or the types of features. This is especially true for DAS sources with numerous features or many types of features.

Intended Usage

Is your data purely display-driven, and if so which clients will it be compatible with? Ensembl, SPICE, Dasty and Pfam are all graphical DAS clients you may consider testing your source with. Is your data amenable to being used programatically? Consider fleshing out your features with terms from the 1.53E ontology created for the BioSapiens project.

Data Storage

For small numbers of features, a flat file may be a sufficiently fast storage medium for your data, but often an indexed relational database is necessary. If you have an existing database, does it contain details of your reference sequences such as versions or checksums? ProServer can take a lot of the work out of making your data as useful as possible if you store some of this information:


Implementing a DAS Source

This section is a developers' guide to implementing a DAS source, intended as a companion to ProServer's POD documentation. It is assumed that you are somewhat familiar with the concept and basic architecture of DAS, which you can read about on the BioDAS website. ProServer also supports the 1.53E extensions as described at the DAS Registry. Writing a custom DAS source requires intermediate object-oriented Perl programming ability.

ProServer is designed to be a lightweight DAS server that is simple to set up extend. Each server can host one or more DAS sources, with each source (or DSN) being represented by a single SourceAdaptor Perl object and INI configuration. Implementing a DAS source in ProServer therefore entails providing a subclass of the Bio::Das::ProServer::SourceAdaptor package and INI to configure it.

Code Structure

A ProServer installation essentially has three components: the core server, an INI configuration file and one or more SourceAdaptor instances.

Whilst the core server handles client communications, command processing and building XML responses, it is the job of a SourceAdaptor to translate any specifics of the data into a simple unified data structure. ProServer is distributed with several example SourceAdaptors, one of which you may be able use with your data. If not, it is a simple matter to create your own.

The INI file is used to configure both the server as a whole (e.g. the port number to listen on) and each DAS source (e.g. the database to connect to). Here, each DAS source is an instance of a SourceAdaptor module. It is therefore possible to have more than one DAS source using the same SourceAdaptor code.

INI Format

ProServer takes its configuration from a standard INI file, specified at startup. The file is divided into sections: one 'general' section for server-specific options, and one section per DAS source. The various server-specific options are described in the example proserver.ini file. The server processes each other section as follows:

PropertyExampleFunction
section[simple_human]Required; defines the DAS source name (DSN)
adaptoradaptor = simpledbRequired; the SourceAdaptor subclass that will represent the source.
statestate = onUnless set to 'on', the source is not enabled.
transporttransport = dbiThe Transport subclass that will be built for the source.
autodisconnectautodisconnect = 1800Specifies that the Transport should clean up after itself following a command. Can be 'yes', or a specified number of seconds.
hydrahydra = dbiSpecifies a 'multi-headed' Hydra source. A single definition can generate multiple sources.
parentparent = simple_mouseSpecifies that a source should inherit properties from another source. Only undefined properties are inherited. Chained and reciprocal inheritance is permitted.

You may also specify additional custom properties: these are passed into the SourceAdaptor and Transport object stack.

Transports

Each DAS source may be configured with zero or more transports. A transport is designed to handle data access implementation, reducing the need to write boilerplate code. ProServer is supplied with several Bio::Das::ProServer::SourceAdaptor::Transport implementations, allowing easy access to data sources including, for example, relational databases, flat files and the Ensembl API.

Transports are passed the same INI properties as SourceAdaptors, allowing them to be configured in the same way. For example, the DBI transport requires the 'dbname' parameter. See individual transports' POD documentation for details. Below is an example that uses the DBI transport to handle the tedious aspects of querying a relational database.

# Generic features stored in an SQL table
my $features = $self->transport->query('select * from features where segment = ? and end >= ? and start <= ?',
                                       $segment, $start, $end);

Although most sources have only a single transport, it is possible to configure multiple transports for a single source. This can be done by specifying overriding properties for named transports. This is best illustrated with an example:

[foobar]
state         = on
adaptor       = doubledb
transport     = dbi
dbuser        = anonymous
dbname        = foodb
bar.dbname    = bardb

my $foos = $self->transport()->query($sql, @args);      # connects to 'foodb'
my $bars = $self->transport('bar')->query($sql, @args); # connects to 'bardb'

Hydras

A hydra source is a 'multi-headed' source with a single configuration. A Bio::Das::ProServer::SourceHydra can be used to automatically create several sources, each using the same SourceAdaptor implementation. For example, the 'dbi' SourceHydra generates a SourceAdaptor object for each database table matching a given prefix.

Command methods

The Bio::Das::ProServer::SourceAdaptor base package contains much of the code to handle DAS requests and format an appropriate response, with several 'stub methods' left for you to implement. In particular, each DAS command is associated with a 'build' method that SourceAdaptor subclasses should override if it is to implement the command. Each of these methods is called with the arguments given to the command, and expects a specific data structure. Details for arguments and return types are given in the POD documentation for Bio::Das::ProServer::SourceAdaptor. Some commands also execute other methods which may be optionally overridden.

Implemented commands must also be specified in the 'capabilities' metadata in order to be activated.

Features

Methodbuild_features
Also callsinit_segments, known_segments, length, segment_version

Types

Methodbuild_types
Also callsknown_segments, length, segment_version

Sequence; DNA

Methodsequence
Also callsknown_segments, length, segment_version
NotesThe 'segment_version' method is only called if no version is provided in the returned data structure.

Entry Points

Methodbuild_entry_points
Also calls-
NotesHas a default implementation that relies on the 'known_segments' and 'length' methods.

Alignment

Methodbuild_alignment
Also callsknown_segments

Structure

Methodbuild_structure
Also callsknown_segments

Volmap

Methodbuild_volmap
Also callsknown_segments

Interaction

Methodbuild_interaction
Also calls-
NotesDoes not filter unknown segments (this command treats query segments differently).

Other methods

These methods are not tied to a single DAS command, but rather may be called in support of several. None are explicitly required for a functioning source, but all make the source more useful (e.g. by providing details of the sequence upon which annotations are based). Therefore it is best to implement as many as possible.

MethodPurposeDefaultNote
known_segments Implement this method to provide a list of identifiers known to the DAS source, used by ProServer to filter requests for unknown or incorrect segments. - By default 'build_entry_points' calls this method.
length Implement this method to provide the length of a segment as it is known to the source. This is used by ProServer to filter requests for invalid ranges. 0 By default 'build_entry_points' calls this method.
segment_version Implement this to provide a version or checksum of a segment as known by the source. 1.0 -
init_segments Purely a convenience, called before build_features to allow the source to prepare the data for a list of segments if this is more efficient. - -

Stylesheets

The stylesheet command does not need to be configured in code. Instead, it is resolved using:

  1. A 'stylesheet' INI property. The value should be the whole stylesheet XML (inline).
  2. A 'stylesheetfile' INI property. The value should be the location of the XML file.
  3. The default stylesheet, which draws features as a black box. May be changed by overriding the 'das_stylesheet' method.

XSL Stylesheets

The same technique for defining the stylesheet command also applies to XSL stylesheets. XSL stylesheets are used by web browsers to transform the XML responses of DAS commands into a human-readable format.

Here, the relevant INI properties are 'features_xsl' or 'features_xslfile' etc. Not specifying either results in the default ProServer XSL being used.

Homepages

ProServer provides a default 'homepage' for each DAS source, which gives some simple information about the source. However, it is possible to provide an HTML page to display instead, in the same manner as for stylesheets.

Metadata

Each DAS source should provide information about itself that helps clients to determine what kind of data it offers. In true TMTOWTDI Perl spirit, ProServer provides several ways to provide the metadata, either in code or via INI properties. In order of precedence:

  1. Overriding the relevant method. See the Bio::Das::ProServer::SourceAdaptor POD documentation for details.
  2. Setting a variable in the object stack (using the 'init' method).
  3. Specifying a config property (no code change required).
  4. Nothing: the default value (if any) is used.

Below is a list of metadata properties you should provide for your source. Note that the 'capabilities' property is required.

Property Type Purpose Default
capabilitieshashrefCommands and options offered (sources command)-
coordinateshashrefCo-ordinate systems and test ranges (sources command)-
propertieshashrefCustom tags (sources command)-
titletextHuman readable name (sources/dsn command)The source name (DSN)
descriptiontextHuman readable description (sources/dsn command)The title
doc_hrefURLLocation of documentation/homepage (sources command)A default ProServer homepage.
source_uritextUsed to group sources (sources command)The version URI
version_uritextUniquely identifies a source (sources command)The source name (DSN)
maintaineremailIdentifies a point of contact (sources command)The server maintainer
dsncreateddateSource date (sources command, HTTP headers)The 'last modified' date of the Hydra or Transport (if supported) or epoch
dsnversionnumberSource version (dsn command)1.0
strict_boundariesbooleanIf set, out-of-range segments will be filtered. Relies on length method.The server setting
mapmasterURLReference source (dsn command)-

Co-ordinates can be specified in the INI file using the format:

coordinates = NCBI_36,Chromosome,Homo sapiens -> X:10000000,10111111 ; Ensembl,Gene_ID,Homo sapiens -> ENSG00000000001

Or in code using:

sub init {
  my $self = shift;
  $self->{'coordinates'} = {
    'NCBI_36,Chromosome,Homo sapiens'                      => 'X:10000000,10111111',
    'ensembl,gene_ID,homo sapiens'                         => 'ENSG00000000001',
    'http://www.dasregistry.org/dasregistry/coordsys/CS_DS6' => 'BRAF_HUMAN'
  };
}

Here, the key is either the URI or description of the co-ordinate system (see the included registry coordinates XML file for details). It is case insensitive. The value is a segment range that can be used to test the source. See the DAS Registry documentation for more details of co-ordinate systems.

Registration

Many clients, such as Ensembl and SPICE, automatically connect to the DAS Registry to retrieve a list of DAS sources. If you register your source, it will reach a wider audience. The registry can also monitor your DAS source and inform you if it is not working correctly, and also provide an 'auto-activation' URL that will enable and configure your DAS source in Ensembl.

Because registered DAS sources are automatically available to several clients, it is preferable for registered DAS sources to be as 'well-formed' as possible. This includes providing accurate and up-to-date metadata for your source, as well as consistent and usable data. You may wish to consider whether your data fits into the 1.53E ontology developed for BioSapiens.

Examples

There are several SourceAdaptor implementations provided with ProServer that serve as useful examples. The 'simple' adaptors may be particularly useful as starting points.


Updating from previous versions

If you have already developed DAS sources, you may wish to update them to support the DAS/1.53E 'sources' command, which provides for more meaningful descriptions of the services that a DAS source offers. Updating your source is a simple matter of providing some metadata: see the Metadata section of the guide for details of how this is done. You will probably want to add the following:

As of version 2.7 ProServer makes use of external data files. You may need to set the "serverroot" property in order for the server to find them. See also the "styleshome" and "coordshome" properties in the example INI file.


Further Information

The following links provide useful background or further information about using DAS:

Questions, bug reports, feature requests et cetera should be directed to the mailing list.