66c Distilling the Complexity of Metabolic Chemistry

Christopher Henry, Mathematics and Computer Science, Argonne National Laboratory, 9700 S. Cass Avenue, Argonne, IL 60439, Linda J. Broadbelt, Northwestern University, Department of Chemical Engineering, 2145 Sheridan Road E136, Evanston, IL 60208-3120, and Vassily Hatzimanikatis, Laboratory of Computational Systems Biotechnology, EPFL, CH-1015, Lausanne, Switzerland.

Recent advances in DNA sequencing technologies have reduced the time required to sequence an entire prokaryotic genome to a few days. Development of the Rapid Annotation using Subsystems Technology (RAST) server allows newly sequenced prokaryotic genomes to be annotated in less than 24 hours [1]. However, the conversion of newly annotated genomes into metabolic models that may be used to accurately predict cellular phenotypes, gene essentiality, and metabolic engineering strategies is still a process that can take months at the minimum [2]. While methods exist for automatically mapping biochemical reactions to the functional roles assigned to genes during the annotation process [3], these methods depend upon the EC classification system and databases of biochemical reactions such as the KEGG [4]. The metabolic models produced using these methods are plagued by errors arising from the misclassification of reactions within the EC system or by errors in the stoichiometry of reactions stored within the reaction databases. The manual effort needed to identify and correct such errors accounts for a significant portion of the time required to produce a functioning genome-scale metabolic model.

We have developed a new algorithm that utilizes the Biological Network Integrated Computational Explorer (BNICE) framework [5] to: (i) systematically assign EC classes to biochemical reactions, (ii) correct errors in reaction cofactor stoichiometry, and (iii) generate mappings from the reactant atoms to the product atoms of biochemical reactions. This algorithm utilizes a set of 86 reaction rules developed from a manual curation of the EC classification system. These reaction rules are applied iteratively to the substrates of each biochemical reaction in an attempt to reproduce the reaction. For each reaction that is reproduced using the reaction rules, errors in the reaction cofactor stoichiometry are identified and corrected, a mapping is generated between the atoms in the reactants and products of the reaction, and an EC number is assigned to the reaction based on the EC number associated with the reaction rules that reproduced the reaction. This methodology has been successfully applied to reproduce the majority of the eligible biochemical reactions contained in the KEGG database and in genome-scale metabolic models of E. coli, S. cerevisiae, and B. subtilis. The results of this work will help to improve the speed and efficiency with which new metabolic models may be assembled. The atom mappings generated by the method will be invaluable for interpreting the results of 13C tracer experiments being applied to measure the rates of intracellular reactions. Finally, this work indicates the areas of metabolism not covered by our current set of 86 reaction rules facilitating the continuing efforts to produce new reaction rules for the BNICE framework.

1. Aziz, R.K., et al., The RAST Server: rapid annotations using subsystems technology. BMC Genomics, 2008. 9: p. 75.

2. Price, N.D., J.L. Reed, and B.O. Palsson, Genome-scale models of microbial cells: evaluating the consequences of constraints. Nat Rev Microbiol, 2004. 2(11): p. 886-897.

3. DeJongh, M., et al., Toward the automated generation of genome-scale metabolic networks in the SEED. BMC Bioinformatics, 2007. 8: p. 139.

4. Kanehisa, M. and S. Goto, KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research, 2000. 28(1): p. 27-30.

5. Hatzimanikatis, V., et al., Exploring the diversity of complex metabolic networks. Bioinformatics, 2004. 21(8): p. 1603-1609.