In This Section
Media Relations
William Heisel
+1-206-897-2886
wheisel@uw.edu
Jill Oviatt
+1-206-897-2862
oviattj@uw.edu
Machine learning methods for pattern-based disease marker discovery
Synopsis
We present a machine learning-based approach to the discovery of robust proteomic patterns for disease classification using high performance mass spectrometry data. We have developed algorithms that combine pattern-based (unidentified peptide peaks) and identity-based (peptides sequenced via tandem mass spectrometry) information to generate relative quantitation for peptides across multiple samples. This data is then analyzed using pattern recognition algorithms to identify unique patterns that discriminate among diseases of interest and between disease cases and controls.
We have also developed rigorous methods to identify potential systematic biases and to statistically validate these disease-associated patterns by randomized testing.
As a case study, we apply this methodology to rigorously characterized tuberculosis patient plasma samples. The analysis yields a rich set of patterns and accurate classification models that demarcate disease cases from controls. An expansive set of random permutation tests are used to assess statistical significance. This methodology is also being applied to the analysis of multiple diseases from multiple sites.
Bio
D. R. Mani is a Senior Computational Biologist in the Proteomics Group at the Broad Institute of MIT and Harvard. He has a PhD in Computer Science from the University of Pennsylvania with expertise in computational pattern recognition, machine learning, signal processing, statistical data analysis, and parallel computing.
For over a decade, he has applied these methods to the analysis of data from a variety of domains ranging from telecommunications and customer relationship management to omics-scale data generated from a wide range of bioassays including mass spectrometry-based proteomics and gene expression profiling.
He has designed massively parallel machine learning algorithms, implemented systems for mining customer data, created a platform for large-scale, pattern-based proteomic biomarker discovery from mass spectrometric data, developed algorithms for evaluation of data quality in targeted proteomic assays, and played a leading role in the analysis and visualization of quantitative proteomics data.
In his current position at the Broad Institute, Dr. Mani continues to apply bioinformatics and computational methods to proteomics and other data to address key problems spanning many aspects of proteomic biomarker discovery and verification.