Zea mays Assembly and Gene Annotation

About Zea mays

Zea mays (maize) has the highest world-wide production of all grain crops, yielding 875 million tonnes in 2012 (http://faostat.fao.org/). Although a food staple in many regions of the world, most is used for animal feed and ethanol fuel. Maize was domesticated from wild teosinte in Central America and its cultivation spread throughout the Americas by Pre-Columbian civilizations. In addition to its economic value, maize is an important model organism for studies in plant genetics, physiology, and development. It has a large genome of of about 2.4 gigabases with a haploid chromosome number of 10 (Schnable et al, 2009; Zhang et al, 2009). Maize is distinguished from other grasses in that its genome arose from an ancient tetraploidy event unique to its lineage.

Assembly

The complete genome sequence of Zea mays cv. B73 (RefGen_v1) was published in 2009 by the NSF-funded Maize Genome Sequencing Project (Schnable et al, 2009). The high-quality assembly was accomplished by a strategy of sequencing individual BAC clones along a minimum tiling path anchored to genetic and physical maps (Schnable et al, 2009; Wei, Zhang et al, 2009]. This version of the assembly (RefGen_v3) incorporates additional contigs assembled from whole genome shotgun sequencing reads (Olson et al, manuscript in preparation). These contigs were selected because they include portions of full length cDNAs that were not covered by the BAC based assembly. The contigs were inserted into gaps based on a synteny-refined genetic map. This genetic map was also used to rearrange some clones.

Annotation

Genes were originally annotated using both an evidence-based approach (e.g. using cDNA and EST data) and an ab initio approach (FGENESH), which were combined to give a unique non-overlapping gene set. New and updated gene models are limited to the regions where new contigs were inserted.

MAKER-P Gene Models

In addition to the familiar "Gramene Gene" annotation track, the maize browser now includes a new "MAKER-P_genes" track, providing gene models annotated using MAKER-P software (Law et al, 2015). Incorporating evidence from a large number of RNA-seq studies, this new set includes 4,466 additional protein-coding genes not present in the 5b+ annotation build, and improves the annotation of UTR's in 1,393 gene models. This set excludes 2,647 5b+ gene models that lack support in the MAKER-P build (Law et al, 2015).

Zea mays nascent transcriptomes

Nascent transcriptomes of wild-type (WT) and RNA Pol D1 (rpd1) mutant seedlings are available as RNA aligments in the genome browser. The source of these data is the publication by Erhard et al (2015).

Long non-coding RNAs

The results of a genomewide screen for long non-coding RNAs (lncRNAs) are shown as RNA alignments of 20,163 putative lncRNAs, including ~1,704 high-confidence lncRNAs, as published by Li et al (2014).

Regulation

Gene expression probes

Oligo probes from the GeneChip Maize Genome Array have been aligned using the standard Ensembl 2-step mapping procedure. For example, see the the results for Zm.155.1.A1_a_at.

DNA methylation

Genomewide patterns of DNA methylation for two maize inbred lines, B73 and Mo17, are now diplayed on the maize genome browser. Cytosine methylation in symmetric (CG and CHG, where H is A, C, or T) context is associated with DNA replication and histone modification. CG (65%) and CHG (50%) methylation is also highest in transposons. Source: Maize methylome publication by Regulski et al (2013).

Variation

HapMap2 dataset

A variation set which comprises the maize HapMap2 data (Chia et al, 2012). This dataset incorporates 55 million SNPs and indels identified in a collection of 103 pre-domesticated and domesticated Zea mays varieties, including a representative from the sister genus, Tripsacum dactyloides (Eastern gamagrass). Each line was sequenced to an average of 4.5X coverage using the Illumina GAIIx platform. The reads can be accessed from the SRA, with accession ID: SRA051245. Reads were mapped to the B73 reference genome using a combination of Bowtie, Novoalign and SOAP. The variations were scored by taking into account identity-by-descent blocks that are shared among the lines.

The Panzea 2.7 genotyped-by-sequencing (GBS) dataset

This variation data set consists of 719,472 SNPs (excluding 332 SNPs that were removed for mapping to scaffolds) typed in 16,718 maize and teosinte lines, and grouped in 14 overlapping populations according to the germplasm set in the corresponding metadata table.

Links

References

  1. A genome-wide characterization of microRNA genes in maize.
    Zhang L, Chia JM, Kumari S, Stein JC, Liu Z, Narechania A, Maher CA, Guill K, McMullen MD, Ware D. 2009. PLoS Genet.. 5:e1000716.
  2. Maize HapMap2 identifies extant variation from a genome in flux.
    Chia JM, Song C, Bradbury PJ, Costich D, de Leon N, Doebley J, Elshire RJ, Gaut B, Geller L, Glaubitz JC et al. 2012. Nat. Genet.. 44:803-807.
  3. The B73 maize genome: complexity, diversity, and dynamics.
    Schnable PS, Ware D, et al.. 2009. Science. 326:1112-1115.
  4. FAOSTAT.
  5. Comparative population genomics of maize domestication and improvement.
    Hufford MB, Xu X, van Heerwaarden J, Pyhjrvi T, Chia JM, Cartwright RA, Elshire RJ, Glaubitz JC, Guill KE, Kaeppler SM et al. 2012. Nat. Genet.. 44:808-811.
  6. A first-generation haplotype map of maize.
    Gore MA, Chia JM, Elshire RJ, Sun Q, Ersoz ES, Hurwitz BL, Peiffer JA, McMullen MD, Grills GS, Ross-Ibarra J et al. 2009. Science. 326:1115-1117.
  7. Detailed analysis of a contiguous 22-Mb region of the maize genome.
    Wei F, Stein JC, Liang C, Zhang J, Fulton RS, Baucom RS, De Paoli E, Zhou S, Yang L, Han Y et al. 2009. PLoS Genet.. 5:e1000728.
  8. The physical and genetic framework of the maize B73 genome.
    Wei F, Zhang J, Zhou S, He R, Schaeffer M, Collura K, Kudrna D, Faga BP, Wissotski M, Golser W et al. 2009. PLoS Genet.. 5:e1000715.
  9. A single molecule scaffold for the maize genome.
    Zhou S, Wei F, Nguyen J, Bechner M, Potamousis K, Goldstein S, Pape L, Mehan MR, Churas C, Pasternak S et al. 2009. PLoS Genet.. 5:e1000711.
  10. Evidence-based gene predictions in plant genomes.
    Liang C, Mao L, Ware D, Stein L. 2009. Genome Res.. 19:1912-1923.

Picture credit: Nicolle Rager Fuller, National Science Foundation.

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyAGPv3, INSDC Assembly GCA_000005005.5, Apr 2013
Database version88.6
Base Pairs3,233,616,351
Golden Path Length2,067,622,303
Genebuild byMaizeSequence
Genebuild methodImported from MaizeSequence.org
Data sourceMaizeSequence.org

Gene counts

Coding genes39,469
Non coding genes156
Small non coding genes156
Gene transcripts63,391

Other

FGENESH gene prediction59,309
Short Variants51,151,184

About this species