66d Coregulation Analysis of Highly Coexpressed and Non-Coexpressed Genes with a Novel Promoter Similarity Index

Nguyen Tung, BioMAPS Institute for Quantitative Biology, Rutgers University, Rutgers University, Piscataway, NJ 08540, Richard Nowakowski, Department of Neuroscience and Cell Biology, University of Medicine and Dentistry, Piscataway, NJ 08540, and Ioannis P. Androulakis, Biomedical and Chemical & Biochemical Engineering, Rutgers - The State University of New Jersey, 599 Taylor Road, Piscataway, NJ 08854.

Understanding gene regulation is a critical step towards deciphering the complexities of transcription [1, 2]. Although microarray technology can reveal patterns of gene expression [3, 4], the regulation mechanism driving gene expression is still unknown in most cases [5]. A reasonable hypothesis is that similar gene promoters may imply the possibility that similar transcription factors can bind and be potentially functional determinants of gene regulation [2, 6-8]. Thus, two critical questions emerge: (i) what constitutes a set of coexpressed versus non-coexpressed genes, and (ii) how do we handle the fact that a gene may have multiple promoters [7, 9] corresponding to different transcription start sites?

The first question is important since the silent underlying assumption of regulation analysis is that genes that have the same regulation mechanism ought to respond in a similar manner under similar stimuli. The second question is very critical as well. Since it is known that genes can be co-expressed under certain conditions and not under others it clearly implies that alternative regulation mechanisms must be available to each gene, otherwise the problem would be trivial. We therefore computationally examine possible combinations of alternative promoters with the hope of clarifying the mechanisms characterizing regulation.

In this presentation, we will first discuss a novel methodology based on a multi-clustering approach [10-13] which, instead of simply clustering the entire data set, attempts to identify within the data a subset of highly coexpressed and non-coexpressed genes i.e. with a confidence level d identify those genes which are either co-expressed or non co-expressed. Subsequently, we consider whether some combinations from alternative promoter sequences for each gene [14] in this subset can disclose alternative regulation mechanisms. Under the assumption that the more similar the promoter sequences are, the more probable is that the corresponding genes are co-regulated [8, 15-17], we propose a novel index of promoter similarity that takes into account all possibilities of regulatory elements e.g. TFs, miRNAs [1, 18] to computationally represent the coregulation level between any two genes. Instead of scanning for common patterns e.g. conserved motifs [19, 20], TFBSs [21, 22] in promoter regions of coexpressed genes and then calculating the coregulation level between two genes [15-17], the measure here is estimated directly from their sequences [6, 8, 23, 24] regardless as to whether genes coexpressed or not. Using the proposed measure of promoter similarity, the pool of promoter sequences is hierarchically clustered [24, 25] with a similarity threshold calculated from the background set of sequences and mapped onto groups of coregulated genes. Clustering results are then rationalized in the context of expression under different experimental conditions to identify possible combinations of alternative promoters activated under a specific condition. Representative promoters in each cluster of coregulated genes now can be considered as the specific promoters for those genes involved in a specific regulation mechanism.

Consequently, our work aims at identifying the possibility that alternative promoters, thus regulation mechanisms, are activated under different conditions. Therefore, we wish to determine whether the combination of co-expression and co-regulation analyses can reveal condition-specific regulation mechanisms.

References

1. Chen K, Rajewsky N: The evolution of gene regulation by transcription factors and microRNAs. Nat Rev Genet 2007, 8(2):93-103.

2. Nguyen DH, D'Haeseleer P: Deciphering principles of transcription regulation in eukaryotic genomes. Mol Syst Biol 2006, 2:2006 0012.

3. Belacel N WQ, Cuperlovic-Culf M: Clustering methods for microarray gene expression data. OMICS 2006, 10(4):507-531.

4. Jiang D TC, Zhang A: Cluster Analysis for Gene Expression Data: A Survey. IEEE Trans on Knowledge and Data Eng 2004, 16(11):1370-1386.

5. Michalak P: Coexpression, coregulation, and cofunctionality of neighboring genes in eukaryotic genomes. Genomics 2008, 91(3):243-248.

6. Allocco DJ, Kohane IS, Butte AJ: Quantifying the relationship between co-expression, co-regulation and gene function. BMC Bioinformatics 2004, 5:18.

7. ENCODE Project Consortium: Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 2007, 447(7146):799-816.

8. Park PJ, Butte AJ, Kohane IS: Comparing expression profiles of genes with similar promoter regions. Bioinformatics 2002, 18(12):1576-1584.

9. Carninci P, Sandelin A, Lenhard B, Katayama S, Shimokawa K, Ponjavic J, Semple CA, Taylor MS, et al: Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet 2006, 38(6):626-635.

10. Grotkjaer T, Winther O, Regenberg B, Nielsen J, Hansen LK: Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm. Bioinformatics 2006, 22(1):58-67.

11. Laderas T, McWeeney S: Consensus framework for exploring microarray data using multiple clustering methods. Omics 2007, 11(1):116-128.

12. Monti S TP, Mesirov J , Golub T: Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Mach Learn 2003, 52:91-118.

13. Swift S, Tucker A, Vinciotti V, Martin N, Orengo C, Liu X, Kellam P: Consensus clustering and functional interpretation of gene-expression data. Genome Biol 2004, 5(11):R94.

14. Genomatix DB: http://www.genomatix.de/.

15. van Helden J: Metrics for comparing regulatory sequences on the basis of pattern counts. Bioinformatics 2004, 20(3):399-406.

16. van Helden J, Andre B, Collado-Vides J: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol 1998, 281(5):827-842.

17. van Helden J, Rios AF, Collado-Vides J: Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res 2000, 28(8):1808-1818.

18. Hobert O: Gene regulation by transcription factors and microRNAs. Science 2008, 319(5871):1785-1786.

19. Chen X, Guo L, Fan Z, Jiang T: W-AlignACE: An improved Gibbs sampling algorithm based on more accurate position weight matrices learned from sequence and gene expression/ChIP-chip data. Bioinformatics 2008.

20. Elkon R, Linhart C, Sharan R, Shamir R, Shiloh Y: Genome-wide in silico identification of transcriptional regulators controlling the cell cycle in human cells. Genome Res 2003, 13(5):773-780.

21. Cartharius K, Frech K, Grote K, Klocke B, Haltmeier M, Klingenhoff A, Frisch M, Bayerlein M, Werner T: MatInspector and beyond: promoter analysis based on transcription factor binding sites. Bioinformatics 2005, 21(13):2933-2942.

22. Chekmenev DS, Haid C, Kel AE: P-Match: transcription factor binding site search by combining patterns and weight matrices. Nucleic Acids Res 2005, 33:W432-W437.

23. Blanco E, Messeguer X, Smith TF, Guigo R: Transcription factor map alignment of promoter regions. PLoS Comput Biol 2006, 2(5):e49.

24. Veerla S, Hoglund M: Analysis of promoter regions of co-expressed genes identified by microarray analysis. BMC Bioinformatics 2006, 7:384.

25. Kankainen M, Loytynoja A: MATLIGN: a motif clustering, comparison and matching tool. BMC Bioinformatics 2007, 8:189.