Thalassiosira pseudonana Assembly and Gene Annotation

About Thalassiosira pseudonana CCMP1335

Thalassiosira pseudonana is a species of marine centric Bulka diatoms. It was chosen as the first eukaryotic marine phytoplankton for whole genome sequencing. T. pseudonana was selected for this study because it is a model for diatom physiology studies, belongs to a genus widely distributed throughout the world's oceans, and has a relatively small genome at 34 mega base pairs. Scientists are researching on diatom light absorption, using the marine diatom of Thalassiosira. The diatom requires a high enough concentration of CO2 in order to utilize C4 metabolism (Clement et al. 2015).

Taxonomy ID 296543 (Text from Wikipedia.) Picture credit: (Image source)

More information General information about this species can be found in Wikipedia

Assembly

The genome assembly was done at JGI and is also available from the EMBL/Genbank/DDBJ databases under the accession GCA_000149405.1

Annotation

The protein coding genes in this site are a direct import from JGI. Non coding RNA genes have been annotated using tRNAScan-SE (Lowe, T.M. and Eddy, S.R. 1997), RFAM (Griffiths-Jones et al 2005), and RNAmmer (Lagesen K.,et al 2007). Publications using this data should cite the following publication:

References

The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism.
Armbrust EV, Berges JA, Bowler C, Green BR, Martinez D, Putnam NH, Zhou S, Allen AE, Apt KE, Bechner M et al. 2004. Science. 306:79-86.
Identification and comparative genomic analysis of signaling and regulatory components in the diatom Thalassiosira pseudonana.
Montsant Anton, Allen AndrewE, Coesel Sacha, Martino AlessandraDe, Falciatore Angela, Mangogna Manuela, Siaut Magali, Heijde Marc, Jabbari Kamel, Maheswari Uma et al. 2007. J. Phycol.. 43:585-604.
Update of the Diatom EST Database: a new tool for digital transcriptomics.
Maheswari U, Mock T, Armbrust EV, Bowler C. 2009. Nucleic Acids Res.. 37:D1001-5.

Picture credit: Joint Genome Institute. Artistic rendering by Leila Hornick.

Other Data

T. pseudonana ESTs were sequenced from 7 different cDNA libraries. These ESTs were aligned to the genome using exonerate.

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

Assembly	ASM14940v2, INSDC Assembly GCA_000149405.2, May 2014
Database version	115.2
Golden Path Length	32,437,365
Genebuild by
Genebuild method	Import
Data source	JGI

Gene counts

Gene/transcipt that contains an open reading frame (ORF).Coding genes	11,672
Non coding genes	98
Small non coding genes	98
A gene that has homology to known protein-coding genes but contain a frameshift and/or stop codon(s) which disrupts the ORF. Thought to have arisen through duplication followed by loss of function.Pseudogenes	99
A transcript is the operational unit of a gene. In a genomic context, transcripts consist of one or more exons, with adjoining exons being separated by introns. The exons/introns are transcribed and then the introns spliced out. Transcripts may or may not encode a proteinGene transcripts	11,870

Thalassiosira pseudonana Assembly and Gene Annotation

About Thalassiosira pseudonana CCMP1335

Assembly

Annotation

References

Other Data

More information

Statistics

Summary

Gene counts

About Us

Get help

Our sister sites

Follow us

Favourite species

All species

Thalassiosira pseudonana Assembly and Gene Annotation

About Thalassiosira pseudonana CCMP1335

Assembly

Annotation

References

Other Data

More information

Statistics

Summary

Gene counts

About Us

Get help

Our sister sites

Follow us