393g A Novel Mixed-Integer Linear Optimization Framework for the Identification of Post-Translationally Modified Proteins Using Etd/ecd Tandem Mass Spectrometry

Peter A. DiMaggio Jr.1, Richard Baliban2, Benjamin A. Garcia3, and Christodoulos A. Floudas1. (1) Department of Chemical Engineering, Princeton University, Dept of Chemical Engineering; A215 Engineering Quadrangle, Princeton, NJ 08544, (2) Chemical Engineering, Princeton University, Dept of Chemical Engineering; A215 Engineering Quadrangle, Princeton, NJ 08544, (3) Department of Molecular Biology, Princeton University, Dept of Chemical Engineering; A215 Engineering Quadrangle, Princeton, NJ 08544

Accurate identification of post translational modifications (PTMs) can give insight into the dynamic proteome and illuminate the role and function of specific proteins in vivo. Current approaches for the identification of protein modifications utilize a “bottom up” approach, where the proteins are enzymatically digested into smaller peptides that are subsequently ionized and fragmented using Collision Activated Dissociation (CAD) to derive their sequence information [1]. The identification of all the modifications present in a protein sequence hinges on the successful identification of the PTM modifications of its corresponding peptides. This protocol can be limited by (a) insufficient elution and detection of the all the peptides that cover the entire sequence of the protein and (b) false identifications at the peptide level. Additional complications arise when using CAD to study labile modifications such as phosphorylation, glycosylation, or sulfonation. In these instances, the preferred reaction is often the cleavage of the PTM as opposed to the backbone of the peptide, resulting in a high intensity peak corresponding to the parent mass less the cleaved modification. The advent of Electron Capture Dissociation (ECD) [2-3] and Electron Transfer Dissociation (ETD) [4-7] have enabled researchers to address the aforementioned issues associated with CAD in a complementary fashion.

ECD and ETD both involve the reaction of an electron with a highly protonated cation to form an odd-electron peptide. This process induces large amounts of backbone cleavage to yield c- and z-ions that are analogous to the b- and y-ions produced from CAD. Unlike CAD, ECD and ETD cleavage is very weakly affected by the composition and number of amino acids in the peptide and provides more fragmentation coverage than CAD alone. Both ECD and ETD also prevent cleavage of labile modifications and these PTMs are fully present on the c- and z-ions produced during cleavage. The aforementioned benefits make ECD and ETD well-suited for the "top-down" analysis of post-translationally modified proteins. A further benefit of ECD/ETD analysis is the quantitative information obtained from the spectral data. A full protein sequence is eluted into the spectrophotometer, so it is expected that the resulting ratios of peaks intensities will correspond quite closely to concentration ratios of the ions. If two proteins with slightly different modifications eluted at the same time, measurement of the peak intensity ratios can give a good approximation of the ratio of protein present in the sample.

Both ECD and ETD were used to extensively study the modifications on histone H3 [8]. The variety of modifications present on the N-terminus of the protein represents a "histone code" that must be fully characterized to understand the cellular processes in which H3 is involved. To fully appreciate the depth of potential modification combinations for histone H3, hydrophilic interaction chromatography (HILIC) was used in combination with gas-phase isolation of species with different parent masses. HILIC chromatography will separate the different H3 proteins mainly by number of acetyl groups and secondly by degree of methylation. Such a separation will allow for better validation of the spectral data by comparing the potential modifications with the elution time. Consequent analysis has revealed potential relationships between K14, K18, and K23 acetylations as well as interplay between K4 methylation and the previously mentioned acetylations.

In this work, we will present de novo methods based on mixed-integer linear programming (MILP) for the identification of post-translationally modified peptides using ETD and ECD tandem mass spectra, with a particular emphasis on the study of histone H3. Though a vast amount of de novo and database methods have been developed to analyze unmodified spectra [9-11], the ETD/ECD can illuminate certain in vivo features such as internally cleaved sequences and spliced sequences that cannot be determined once the original sample is digested with trypsin. To our knowledge, the only currently available software for the analysis of ETD spectra is the Open Mass Spectrometry Search Algorithm [12], which is a database method. We will present a de novo mixed-integer linear programming (MILP) model developed for the analysis of H3 histone that utilizes the total number of possible modifications, given modified parent mass, to derive the most probable set of PTMs. We will also present a generalized de novo method for the identification of modified and unmodified proteins using mixed-integer linear optimization and ETD/ECD tandem mass spectrometry.

[1] Kinter M and Sherman NE. Protein Sequencing and Identification Using Tandem Mass Spectrometry. New York: Wiley, 2000.

[2] Zubarev RA, Kelleher NL, McLafferty FW. Electron capture dissociation of multiply charged protein cations: a nonergodic process. J. Am. Chem. Soc. 120:3265-3266, 1998.

[3] Bakhtiar R and Guan Z. Electron capture dissociation mass spectrometry in characterization of peptides and proteins. Biotechnol. Lett. 28:1047-1059, 2006.

[4] Syka JEP, Coon JJ, Schroeder MJ, Shabanowitz J, Hunt DF. Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry. Proc. Nat. Acad. Sci. 101(26):9528-9533, 2004.

[5] Mikesh LM, Ueberheide B, Chi A, Coon JJ, Syka JEP, Shabanowitz J, Hunt DF. The utility of ETD mass spectrometry in proteomic analysis. Biochimica. et Biophysica Acta. 1764:1811-1822, 2006.

[6] Udeshi NH, Shabanowitz J, Hunt DF, Rose KL. Analysis of proteins and peptides on a chromatographic timescale by electron-transfer dissociation MS. FEBS Journal. 274:6269-6276, 2007.

[7] Molina H, Horn DM, Tang N, Mathicanan S, Pandey A. Global proteomic profiling of phosphopeptides using electron transfer dissociation tandem mass spectrometry. Proc. Nat. Acad. Sci. 104(7):2199-2204, 2007.

[8] Garcia BA, Pesavento JJ, Mizzen CA, Kelleher NL. Pervasive combinatorial modification of histone H3 in human cells. Nat. Meth. 4(6):487-489, 2007.

[9] DiMaggio PA and Floudas CA. A mixed-integer optimization framework for de novo peptide identification. AIChE Journal. 53(1): 160-173, 2007.

[10] DiMaggio PA and Floudas CA. De novo peptide identification via tandem mass spectrometry and integer linear optimization. Anal. Chem. 79: 1433-1446, 2007.

[11] DiMaggio PA, Floudas CA, Lu B, Yates JR. A hybrid methodology for peptide identification using integer linear optimization, local database search, and QTOF or OrbiTrap tandem mass spectrometry. J. Proteome Res. 7:1584-1593, 2008.

[12] Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M, Maynard DM, Yang X, Shi W, Bryant SH. Open Mass Spectrometry Search Algorithm. J. Proteome Res. 3: 958-964, 2004.