Introduction to the Ensembl Web Code
The Ensembl codebase is highly complex, consisting of many hundreds of modules. The following notes should help you begin to find your way around!
Web code directories
The following directories contain web-related code:
- cbuild
- inline C code for handling data files
- conf
- site-wide configuration files
- ctrl-scripts
- Apache startup and stop scripts
- htdocs
- general HTML content (e.g. code documentation)
- modules
- the main mod_perl codebase used to generate the site
- perl
- Perl "CGI" scripts used for some legacy behaviour
- utils
- various scripts used to maintain an Ensembl website, e.g. updating content
The following directories are typically replicated inside plugins in order to override "core" functionality:
- conf
- htdocs
- modules
See plugins (below) for instructions on how to configure plugins in Ensembl.
Any other directories in your checkout will contain the Perl API, and after server startup you will see some additional autogenerated directories used to cache images and other files.
modules/EnsEMBL/Web
Most of the web code generated by the Ensembl web team lives in the EnsEMBL::Web namespace. You will not normally want to edit this code, but you can extend it by replicating the namespace in your own plugin and adding or overriding methods as required. See Extending the Ensembl web code for more details.
Througout this documentation, EnsEMBL::Web is frequently abbreviated to E::W to save space and typing!
URL routing
Ensembl now uses URL routing, that is, URLs do not necessarily correspond to physical directories but are parsed into their components and passed to a generic script that constructs an appropriate page.
The exception to this is the static content, i.e. simple HTML pages used to hold documentation about the site and project - like this page. As a general rule, if the URL is in lower case, it is static content; if the "directories" have initial capitals, the page is dynamically generated.
A typical dynamic URL is shown below:
http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=BRCA2
The URL is split on '/' into the following parts:
- Species path
- Usually just a single "directory", e.g. 'Homo_sapiens', although on some Ensembl-powered sites such as EnsEMBL Bacteria, a multi-directory structure may be used to group closely related species or strains. Other possible values are 'Multi' (for pages that allow access to multiple species' data, e.g. BLAST) or undef (empty) if the page is not connected to any species (e.g. user account management).
- Type
- This is the type of data being displayed on the page, e.g. Location, Gene, etc for genomic data, or Help, Account, etc for general web pages.
- Action
- This denotes the particular view or sub-display of the type of data. In our example, the Action is "Summary", meaning this is the page summarising useful information about the gene.
- Function
- This is an optional fourth component of the URL. It is mainly used with interactive code such as user account management, e.g. /Account/Bookmark/Edit is the URL for the form where you edit the information stored in a user bookmark.
The parameters after the ? are handled as per normal CGI parameters; in this case, we have the name of the gene we want to display information about.
The URL is parsed in E::W::Apache::SpeciesHandler - this module should be left well alone unless you know exactly what you are doing!
Allowed scripts
In order to determine what type of response a URL requires (full pages, HTML fragments etc.), the Type part of the URL is assigned a script, as follows:
- Page
- Normal web pages
- Modal
- Popup "control panel" (data export, account management)
- Config
- A variation on the modal page, used to create the image configuration control panels
- Component
- Asynchronously generated page elements (the ones that replace the animated spinner)
- ZMenu
- Small popup menus used for contextual navigation
This script definition takes place in the $OBJECT_TO_SCRIPT hash in conf/SiteDefs.pm (which can be extended in your plugin if you want to add data Types to Ensembl):
## ALLOWABLE DATA OBJECTS $OBJECT_TO_CONTROLLER_MAP = { Gene => 'Page', Transcript => 'Page', Location => 'Page', Variation => 'Page', StructuralVariation => 'Page', Regulation => 'Page', Marker => 'Page', GeneTree => 'Page', Family => 'Page', LRG => 'Page', Phenotype => 'Page', Experiment => 'Page', Info => 'Page', Search => 'Page', UserConfig => 'Modal', UserData => 'Modal', Help => 'Modal', };
The value is then used in E::W::Apache::SpeciesHandler to decide which child of E::W::Controller will be used to process the request.
Plugins
The Ensembl webcode is designed to be extensible, so that you can customize your own installation without your changes being overwritten when you update to a new release.
By creating your own plugin, you can completely change the available species, alter the colour scheme or page template, or add your own views and static content.
Public plugins
A selection of plugins are included as part of the standard Ensembl checkout, enabling you to include optional features in your site.
Most public plugins have a README file giving more detailed and up-to-date information on how to use them.
- public-plugins/ensembl
- Used to configure the current set of Ensembl species (as seen on www.ensembl.org) - without this or a similar plugin, no data will appear on your site.
- public-plugins/mirror
- Used to configure your local server settings
- public-plugins/genoverse
- Latest stable version of the Genoverse scrolling browser
- public-plugins/solr
- Solr search engine
- public-plugins/tools
- Web interface for BLAST, VEP, etc
- public-plugins/tools_hive
- eHive backend for tools server
- public-plugins/orm
- This plugin is used to separate optional features (user accounts, ability to update databases through a web interface) from the core functionality of the Ensembl webcode. It uses Rose::DB::Object and its associated modules for database access, and thus has a lot of additional Perl dependencies
- public-plugins/admin
- Administrative interface for non-biological content, such as help and news. Depends on public-plugins/orm
Using plugins
Plugins are used to complement the normal system of inheritance in object-oriented Perl. Whereas a child object can inherit methods from multiple parents, a parent object normally cannot be overridden by multiple children. The plugin system "aggregates" the contents of several methods into one "master" method that can then be used by mod_perl when rendering the webpage.
The module conf/Plugins.pm
controls which plugins are
used by an instance of Ensembl and their order of precedence. In a
standard Ensembl mirror, the module will define a plugin array as follows:
$SiteDefs::ENSEMBL_PLUGINS = [ 'EnsEMBL::Mirror' => $SiteDefs::ENSEMBL_SERVERROOT.'/public-plugins/mirror', 'EnsEMBL::Genoverse' => $SiteDefs::ENSEMBL_SERVERROOT.'/public-plugins/genoverse', # 'EnsEMBL::Solr' => $SiteDefs::ENSEMBL_SERVERROOT.'/public-plugins/solr', # 'EnsEMBL::Users' => $SiteDefs::ENSEMBL_SERVERROOT.'/public-plugins/users', 'EnsEMBL::Ensembl' => $SiteDefs::ENSEMBL_SERVERROOT.'/public-plugins/ensembl' 'EnsEMBL::Docs' => $SiteDefs::ENSEMBL_SERVERROOT.'/public-plugins/docs', ];
The plugins are processed in reverse order, starting with the last one.