|
|
| Consortium for Maize Genomics |
|
|
|
Danforth Goals
The role of the Donald Danforth Plant Science Center in the Consortium for Maize Genomics is to apply a bioinformatics
approach to analyze and assess two methods to isolate and sequence maize genes.
Danforth Center Analysis Goals includes:
1.) Evaluation of gene-hit rate for the individual technologies, and for the resource as a whole.
2.) Estimation of gene coverage.
3.) Estimation of maize genome coverage.
4.) Comparison of GeneThresher® vs. high Cot.
5.) Correlation of gene islands to the maize
genetic/physical map.
Results of these studies will be available on the DATA ANALYSIS page when completed.
The Danforth center is in the process of establishing its bioinformatics
resources required to perform the required analysis. These tasks include
downloading and storing searchable collections of genome sequence data from
public sources, licensing and installing software, installing and
establishing a relational database to store results and allow more complex
comparisons, and establishing a hardware infrastructure to underpin these
activities.
Gene Hit rate:
Gene hit rate will be assessed by identifying confident sequence overlap
between maize genomic sequence assemblies and collections of known or
proposed proteins/genes obtained from public sources. We have been
establishing local copies of sequence databanks including portions of
Genebank, the TIGR gene indices, SWISS-PROT, Arabidopsis whole genome
sequence gene predictions and rice whole genome sequence and gene
prediction. Homology detection for gene detection, PFAM searches for domains
and an in depth examination of the repeat content of the methyl-filtered
assemblies is in process. We have established collaboration with
Dr. Sue
Wessler who has an interest in examining the island sequence for transposon
and MITEs representation.
Gene Coverage:
Evaluation of gene coverage will be examined three ways:
Assembled maize genome island sequence will be aligned to rice gene models
and the degree of coverage assessed.
Assembled maize genome island sequence will be aligned to predicted genic
regions of the maize genome sequence (BAC clone), which will become
available through the efforts of Messing, Wing and Soderlund.
Assembled maize genome island sequence will be aligned to maize mRNA or
potentially full-length maize cDNAs culled from public sources.
Alignment of maize gene assemblies to rice genes is part of the "assessment
of hit rate" process. Analysis of these alignments for coverage will be most
informative when the majority of the sequence data has been acquired and
assembled. An examination of the redundancy within the assembly component
sequences may provide an indicator for completeness of a given island, which
is required to robustly project the coverage potential of each method. We
have acquired several ab initio gene prediction software packages that will
be useful to help identify maize gene models within islands, and may be
necessary to obtain maize gene models from maize BAC sequence. We are
currently examining these methods to gain familiarity and determine their
limitations. A pipeline will be built to automate their use and collect
their outputs.
Estimation of maize genome coverage:
We are look at two methods to assess genome coverage. One method relies on
examining the extent with which maize genome island DNA overlaps gene
boundaries predicted within the maize BAC sequence. A second method relies
on determining a mathematical prediction for coverage. We have established
collaboration with Dr. Mike Wendl at the Genome Sequencing Center at
Washington University. Dr. Wendl has examined the question of mathematically
predicting the clone coverage of large genomes and has developed equations
for this purpose. Dr. Wendl will aid in developing a mathematical model to
predict genome coverage by these methods.
Comparison of GeneTresher®
vs. high Cot:
We will compare the GeneTresher® and high Cot methods for gene content,
representation of gene families and gene classes. We will assess the growth of
sequence assembly size for each of the two methods individually, as well as
examine the content each composite assembly island for components from each
method. These analyses should aid in determining the correct mix of each
sequence type to maximize gene detection, coverage and island contiguity.
Correlation of gene islands to the maize genetic/physical map:
Throughout this process we will strive to associate as many maize genome islands
to the genetic and physical maps as possible. We will work in close
collaboration with Dr. Cari Soderlund at University of Arizona to anchor the maize
assemblies to the physical map through BACend sequence generated by the Messing,
Wing and Soderlund team. This data will be potentially useful in closing gaps
and evaluating accuracy. We will work in collaboration with Cari Soderlund and
Mary Polacco at
MaizeGDB to provide end user access to this mapping information.
|
|
|
|