Ensembl Genomes HomeEnsembl Protists HomeEnsembl Protists Home

What is a genome assembly?

The genome assembly is simply the genome sequence produced after chromosomes have been fragmented, those fragments have been sequenced, and the resulting sequences have been put back together. For more information, see the glossary.

Each species in Ensembl has a reference genome assembly that is produced by an international genome consortium. (Ensembl does not produce genome assemblies.) The reference assembly can be compiled from the DNA of one individual, a collection of individuals, a breed or a strain. This depends on the species. Find the DNA source of each genome sequence in the More information and statistics link on each species home page.

Assembly model

Most assemblies provided to Ensembl are 'haploid assemblies' and represent a single non-redundant path through the genome. Some assemblies, such as human and mouse, come with additional alternate sequences that represent additional paths through the genome. Examples of alternate sequences are:

  • Haplotypes eg. MHC in human
  • Novel patch
  • Fix patch

These alternate sequences can be viewed in the Ensembl browser where available.

Updating a genome assembly

A genome assembly is updated when DNA has been sequenced that allows gaps to be filled. It may also be updated when a new assembling algorithm is released. This work is done by external groups, who submit the updated assembly to the INSDC.

A new genebuild may be performed by Ensembl when

  • a new assembly is submitted to the INSDC, and we decide to download and annotate the updated assembly, or 
  • when large amounts of new experimental data become available (for example, RNAseq, cDNA and protein sequences). 

Assemblies are updated in Ensembl on the order of once every two years, or less often, depending on the species. 

Older versions of genomic assemblies can be found in the archive sites.

Genome coverage

Ensembl does not generate genome assemblies, but rather we download genome assemblies from the INSDC and annotate them. If you have any questions regarding the sequencing coverage of a genome assembly in Ensembl, please contact the original submitter. This information can be found by querying the assembly accession (eg. GCA_000208655.2) or WGS record (eg. AAGV00000000.3).

If you have any other questions about Ensembl, please do not hesitate to contact our HelpDesk. You may also like to subscribe to the developers' mailing list.