How do I download the promoter sequence for my gene of interest?
If you’re working with human or mouse, Ensembl has generated predicted promoter regions through analysing datasets from the ENCODE, ENCODE, Roadmap Epigenomics and Blueprint projects. This is called the Regulatory Build.
The Ensembl Regulatory Build does not directly associate annotated genes with regulatory features, so you will need to search for your gene of interest and look for promoters proximal to the 5’ UTR. You should validate this yourself, either experimentally or by cross-referencing with cell-type activity levels and tissue-specific expression data.
These annotated promoters have unique stable IDs. You can navigate to the Regulation tab for your promoter of interest by searching for the stable ID itself or clicking on a promoter when searching in the context of a gene.
The easiest way to retrieve the sequence is by clicking on ‘Location’ within the Regulation tab, then the blue ‘Export data’ button once in the Location tab. This will export the genomic sequence for the region where the promoter exists.
If you’re not working with human or mouse, you’ll need to define the promoter as X number of basepairs upstream of the Transcription Start Site (TSS). Many people use 500bp to define the ‘promoter region’. Whatever length of upstream sequence you use for your definition, you can download the sequence either:
• through the gene tab by clicking on 'Sequence' in the left hand menu and then downloading the sequence using the blue 'Download Sequence' button and specifying the upstream sequence length in the download options window.
• by searching for the genomic coordinates upstream of your gene of interest in the location tab then clicking on the blue 'Export Data' button on the left hand side of the page.
If you have a large number of genes, you can use the REST API to retrieve the promoter sequences programmatically using the POST sequence endpoints for either genomic regions or stable IDs, depending on your input.