Arabidopsis thaliana Assembly and Gene Annotation

About Arabidopsis thaliana

Arabidopsis thaliana is a small flowering plant that is widely used as a model organism in plant biology. Arabidopsis is a member of the mustard (Brassicaceae) family, which includes cultivated species such as cabbage and radish. Arabidopsis is not of major agronomic significance, but it offers important advantages for basic research in genetics and molecular biology. Arabidopsis thaliana has a genome size of ~135 Mbp, and a haploid chromosome number of 5.

Assembly

The complete genome sequence of Arabidopsis thaliana was first published by the Arabidopsis Genome Initiative in 2000 [1] and was determined by a BAC-by-BAC sequencing strategy anchored to chromosomes using a variety of genetic and physical maps.

Annotation

Gene annotations use cDNA and EST data as well as manual updates informed by cross-species alignments, peptides and community input regarding missing and incorrectly annotated genes. The assembly and annotation are subject to ongoing updates. Read more about arabidopsis gene annotation. This browser is based on data from version 10 of The Arabidopsis Information Resource (TAIR) database, released in November 2010 [2].

Regulation

Mappings for probes from the following expression arrays have been added:

Variation

The Arabidopsis variation database contains data from the screening of 1,179 strains using the Affymetrix 250k Arabidopsis SNP chip [3], and an updated data set produced through a collaboration between Richard Mott at the Wellcome Trust Centre for Human Genetics in Oxford, Paula Kover at the University of Bath, and EBI, funded by the BBSRC which involved the resequencing of 18 Arabidopsis lines [4]. It also contains 392 strains from the 1001 Genomes Project:

Phenotype data has also been added from a GWAS study of 107 phenotypes in 95 inbred lines carried out by Atwell et al. [5]

Variation Data Usage

The 1001 Arabidopsis Genomes project has released data in a pre-publication format from the Salk Institute, WTCHG, MPI, and GMI. This is provided freely to be used by anyone, but the 1001 Arabidopsis Genomes consortium have requested that the scientific ethics of other groups publishing on this pre-publication data are respected. This is outlined in detail in the Fort Lauderdale agreement; in brief, small scale analysis, e.g., the analysis of a single locus is an expected use of the data which can be published on without any expectation of coordination. In contrast, large scale, genome-wide analysis is expected to be either coordinated with the 1001 Arabidopsis Genomes consortium in some manner or published after initial papers. More details on the reasoning for this and details are given in the Fort Lauderdale document.

Links

References

  1. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.
    Arabidopsis Genome Initiative. 2000. Nature. 408:796-815.
  2. The Arabidopsis Information Resource (TAIR): gene structure and function annotation.
    Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, Li D, Meyer T, Muller R, Ploetz L et al. 2008. Nucleic Acids Res.. 36:D1009-14.
  3. Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel.
    Horton MW, Hancock AM, Huang YS, Toomajian C, Atwell S, Auton A, Muliyati NW, Platt A, Sperone FG, Vilhjlmsson BJ et al. 2012. Nat. Genet.. 44:212-216.
  4. Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana.
    Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, Shinn P, Warthmann N, Hu TT, Fu G, Hinds DA et al. 2007. Science. 317:338-342.
  5. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines.
    Atwell S, Huang YS, Vilhjlmsson BJ, Willems G, Horton M, Li Y, Meng D, Platt A, Tarone AM, Hu TT et al. 2010. Nature. 465:627-631.

Picture credit: Emmanuel Boutet.

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyTAIR10, INSDC Assembly GCA_000001735.1, Sep 2010
Database version88.10
Base Pairs135,670,229
Golden Path Length119,667,750
Genebuild byTAIR
Genebuild methodImported from TAIR
Data sourceTAIR

Gene counts

Coding genes27,416
Non coding genes1,359
Small non coding genes1,359
Pseudogenes924
Gene transcripts41,671

Other

FGENESH gene prediction20,579
Short Variants14,234,197
Structural variants13,667

About this species