Architecture: Apache request loop

The Loop

The Apache web server runs as a parent task, with one or more child processes. The parent is responsible for creating the children who in turn serve the actual pages and dynamic content. In Ensembl, the servers usually run with around 10 to 50 child processes. The load average is the number of children plus one parent.

   Apache request
    ↓                
    ↓                
 parent forks to make 
child process             
    ↓                
    ↓                
mod_perl adds new version
 of modules to new child  
    ↓                
    ↓                
  child INIT_handler      

When a new child is required (when the server load is high or the number of children falls below the configured minimum), the Apache parent creates one. The mod_perl environment of the parent is copied to the child, but in Ensembl's case, this is minimal. It is more efficient to put the modules in the parent process at the Apache startup time rather than in each child. The majority of the Perl modules and configuration settings are loaded by the child init handler. Once the child is up and running, requests are sent to it and handled by the inner request loop.

The diagram below shows a simplified Ensembl request loop (although more specifically, this diagram represents a portion of the Apache keep-alive loop).

   Child init handler 
        |
        +------- ← -------+
 R *    ↓                 |
 e *  Post read request   |
 q *    ↓                 |
 u *    ↓                 |
 e *  Trans handler       |
 s *    ↓                 ↑ 
 t *    ↓                 |
   *  Script handler      |
 l *    ↓                 |
 o *    ↓                 |
 o *  Clean up handler    |
 p *    |                 |
        +------- → -------+
        ↓ 
   Child exit handler

Loop handlers

Child init handler

The child init handler is called by the parent when a new child process is created, and sets up the initial configuration of the child to receive requests.

Post read request handler

This handler is the first to be called by a request from a browser. As such, it is responsible for configuring settings that may be needed later in the loop. In Ensembl, this sets up various environmental variables:

  • User ID: if a user has logged in
  • Session ID: if a session has been started 1

These values are read from a cookie on the requesting machine and stored in the @ENV hash. The cookie is set when the session starts, or the user logs in.

N.B. It's worth noting that not all of Perl's usual CGI environment variables are made available by Ensembl.

Trans handler

Once the loop has been initialised by the post read request handler, the Trans handler is responsible for routing the request to the correct script or static page. The URL being requested is used to perform the routing:

http://www.ensembl.org/Homo_sapiens/cytoview?l=2:87487831-87587831
                       |--species--|-script-|-----parameters-----|

The species name determines which data is displayed in the script view, whilst the parameters configure the view in some way. Ensembl looks for the script in the following locations, in the following order:

  • Dynamic scripts in the plugin directories
  • Static pages in the plugin directories
  • Dynamic scripts in the species directory (/perl/default)
  • Static pages in the species directory (/perl/default)
  • Dynamic scripts in the common directories (/perl/multi and /perl/common)
  • Static pages in the common directories (/perl/multi and /perl/common)

If no corresponding script or static page is found, Ensembl will display a 404 page not found error.

Ensembl supports the use of species name aliases. For example, HS will map to Homo Sapiens. These aliases are configurable in the SiteDefs.pm module (detailed in Configuration and sessions).

Script handler

The script handler passes the request to the script or static page identified by the Trans handler. The process of building a new Ensembl page then begins. The dynamic script is executed (usually producing and configuring a new EnsEMBL::Web::Document::WebPage), or the HTML page is parsed and displayed.

Clean up handler

The clean up handler is responsible for performing various maintenance and debugging tasks at the end of a request. This handler is currently used for the following, but can be extended by adding methods to EnsEMBL::Web::ApacheHandler. The clean up handler is outside the users perceived request response time.

  • Debugging: All requests are timed. Expensive requests will trigger a warning in the logs. E.g. "Long Process"
  • Child process management: child processes that use too much memory are killed (uses Apache::SizeLimit)
  • Disaster recovery: killing all child processes in the worst case scenario 2
  • Blast ticket management: submitting, parsing and tidying up Blast searches

Configuring the loop

The Apache request loop is configured by the main Apache httpd.conf file. Any Perl method can be set to handle any of the stages in the event loop.

Footnotes:

  1. A session is started when a configuration change is made to a dynamic page. For example, if a new track is displayed, or a panel closed.
  2. Any process can touch /logs/ensembl.die to force all child processes to be terminated at the next request. A new population of children will then be created by the Apache parent process. This is a rarely used last resort to overcome some bugs in Apache, mod_perl or both.