Significant Pattern Mining
Data Mining, the search for new knowledge in form of statistical dependencies and patterns in big data sets, is omnipresent in modern society, in science and technology as much as in industry and finance. One of its most important branches is Pattern Mining, that is finding groups of co-occuring elements in a collection of sets. For instance, keywords that co-occur in many documents may form a pattern, or groups of atoms that reoccur in molecules with a particular biological function. Data Mining has brought about a huge body of literature on how to efficiently discover such patterns, even in very large datasets.
An unresolved open question is, however, to decide whether a given pattern is not only frequent, but statistically significantly enriched in a particular dataset or class of objects. This question is of essential relevance to all application domains of pattern mining, in particular the life sciences, as they are interested in selecting patterns for further experimental investigation and validation. It is one of our research goals to give an answer to this open problem of significant pattern mining.
Thomas Gumbsch, Christian Bock, Michael Moor, Bastian Rieck, and Karsten Borgwardt
external pageEnhancing statistical power in temporal biomarker discovery through representative shapelet miningcall_made
Anja C. Gumpinger, Bastian Rieck, Dominik G. Grimm, International Headache Genetics Consortium, and Karsten M. Borgwardt
external pageNetwork-guided search for genetic heterogeneity between gene pairs (Bioinformatics 2020)call_made
Felipe Llinares-Lopez, Laetitia Papaxanthos, Damian Roquiero, Dean Bodenham and Karsten Borgwardt
CASMAP: Detection of statistically significant combinations of SNPs in association mapping (Bioinformatics 2018)
Christian Bock, Thomas Gumbsch, Michael Moor, Bastian Rieck, Damian Roquiero and Karsten Borgwardt
external pageAssociation Mapping in Biomedical Time Series via Statistically Significant Shapelet Mining (ISMB and Bioinformatics 2018)call_made
Felipe Llinares-Lopez, Laetitia Papaxanthos, Dean Bodenham, Damian Roqueiro, COPDGene Investigators and Karsten Borgwardt
Genome-wide genetic heterogeneity discovery with categorical covariates (Bioinformatics 2017)
Laetitia Papaxanthos, Felipe Llinares-López, Dean Bodenham and Karsten Borgwardt
Finding significant combinations of features in the presence of categorical covariates (NIPS 2016)
Felipe Llinares-López, Mahito Sugiyama, Laetitia Papaxanthos and Karsten Borgwardt
Fast and Memory-Efficient Significant Pattern Mining via Permutation Testing (SIGKDD 2015)
Mahito Sugiyama, Felipe Llinares López, Niklas Kasenburg and Karsten Borgwardt
external pageSignificant Subgraph Mining with Multiple Testing Correction (SIAM Data Mining 2015)call_made
Felipe Llinares-López, Dominik G. Grimm, Dean A. Bodenham, Udo Gieraths, Mahito Sugiyama, Beth Rowan and Karsten Borgwardt
Genome-wide detection of intervals of genetic heterogeneity associated with complex traits (Bioinformatics/ISMB 2015)
Software
We have developed several algorithms and software packages for significant pattern mining:
CASMAP
While the majority of prior work in association mapping searches for univariate or additive associations between genotype and phenotype, combinatorial association mapping instead aims to discover statistically significant higher-order interactions of genetic markers. In recent years, our group has been at the forefront of the development of new techniques for combinational association mapping, which span multiple publications.
In order to make our work in this domain more accessible to practitioners, we have developed CASMAP, a new software package for combinatorial association mapping in genome-wide association studies. Available both in Python and R, CASMAP allows users to easily carry out region-based association studies and to search for higher-order epistatic interactions of binary markers while correcting for the effect of categorical covariates.
The algorithm is available in our GitHub repository external pageherecall_made.
FastCMH
The Fast Cochran-Mantel-Haenszel (FastCMH) algorithm discovers genomic regions of contiguous SNPs that are associated to a phenotype of interest under a model of genetic heterogeneity. It can search any contiguous set of SNPs in the genome while still properly correcting for mutiple testing and accounting for confounding factors.
The algorithm is available in our GitHub repository external pageherecall_made. It is also included in the CASMAP software package.
FACS
The Fast Automatic Conditional Search (FACS) algorithm is a significant discriminative itemset mining method which conditions on categorical covariates and only scales as O(k log k), where k is the number of states of the categorical covariate. Based on the Cochran-Mantel-Haenszel Test, FACS demonstrates superior speed and statistical power on simulated and real-world datasets compared to the state of the art, opening the door to numerous applications in biomedicine.
The algorithm is available in our GitHub repository external pageherecall_made. It is also included in the CASMAP software package.
Westfall-Young Light
Westfall-Young Light is a significant pattern mining algorithm that uses permutation-testing to account for the presence of redundant patterns, leading to an increase in statistical power. It uses a novel approach to apply permutation-testing in pattern mining, resulting in an algorithm that is drastically faster than prior work and which also requires considerably less memory to run.
The algorithm is available in our GitHub repository external pageherecall_made.
Significant Subgraph Mining with Multiple Testing Correction
The algorithm is available in our GitHub repository external pageherecall_made.
FAIS
The Fast Automatic Interval Search (FAIS) algorithm discovers contiguous sets of SNPs in a genome that are associated to a phenotype of interest under a model of genetic heterogeneity. It can search any contiguous set of SNPs in the genome and still properly correct for mutiple testing, while retaining statistical power.
The algorithm is available in our GitHub repository external pageherecall_made. It is also included in the CASMAP software package.
Presentations
In the following you can find the slides from several talks we have given on this topic.
- Karsten Borgwardt in at the ISCVID Symposium in Lausanne (04.06.2019): Machine Learning for Personalized Medicine
- Karsten Borgwardt at the '15th Current Topics in Bioinformatics' symposium at MDC Berlin (20.05.2019): DownloadMachine Learning for Biomarker Discovery in Clinical Time Series (PDF, 9.2 MB)vertical_align_bottom
- Karsten Borgwardt at the Siemens Healthineers Summit in Zürich (14.03.2019): Dr. Algorithmus - wie KI die Medizin verändert
- Karsten Borgwardt at Roche Basel (19.02.2019): DownloadMachine Learning for Personalized Medicine (PDF, 9.7 MB)vertical_align_bottom
- Karsten Borgwardt at the IMM seminar at the University of Zürich (25.10.2018): DownloadMachine Learning for Personalized Medicine (PDF, 4.2 MB)vertical_align_bottom
- Karsten Borgwardt at the Huawei-ETH workshop in Zürich (25.05.2018): DownloadMachine Learning for Biomarker Discovery (PDF, 3.9 MB)vertical_align_bottom
- Karsten Borgwardt at Roche Basel (18.04.2018): DownloadMachine Learning for Biomarker Discovery (PDF, 3.9 MB)vertical_align_bottom
- Karsten Borgwardt at the SFB/TRR 209 seminar at the University Hospital Tübingen (16.04.2018): DownloadMachine Learning for Biomarker Discovery: Combinatorial Association Mapping (PDF, 3.9 MB)vertical_align_bottom
- Karsten Borgwardt at the seminar series 'Software Trends' at Hochschule Esslingen (13.04.2018): DownloadDie 'Daten-Medizin' (PDF, 3.6 MB)vertical_align_bottom
- Karsten Borgwardt at the Fassberg Seminar Series at MPI Göttingen (13.03.2018): DownloadData Mining in the Life Sciences: Combinatorial Association Mapping (PDF, 3.9 MB)vertical_align_bottom
- Karsten Borgwardt at Google Research Zürich (27.02.2018): DownloadMachine Learning in Medicine: Combinatorial Association Mapping (PDF, 3.8 MB)vertical_align_bottom
- Karsten Borgwardt at the DPPH Meeting in Lausanne (15.02.2018): DownloadPersonalized Swiss Sepsis Study (PDF, 2.7 MB)vertical_align_bottom
- Karsten Borgwardt at the SIB Virtual Computational Biology Seminar Series (20.9.2017): external pageSignificant Pattern Mining for Combinatorial Association Mappingcall_made
- Karsten Borgwardt at the Distinguished Speaker Series at the Center for Bioinformatics, Saarbrücken (10.5.2017): DownloadCombinatorial Association Mapping (PDF, 2.2 MB)vertical_align_bottom
- Karsten Borgwardt at IBT seminar at the Institute for Biomedical Engineering at ETH Zürich (25.4.2017):
DownloadNetwork Mining in Biology and Medicine (PDF, 2.3 MB)vertical_align_bottom - Karsten Borgwardt at the Felix Klein Conference "Mathematical Methods in Big Data" at the Fraunhofer Institute for Industrial Mathematics ITWM in Kaiserslautern (30.09.2016): DownloadMachine Learning for Personalized Medicine (PDF, 8.7 MB)vertical_align_bottom (from slide 46)
- Felipe Llinare López at Krupp symposium 2017 (21.10.2016): DownloadSignificant Pattern Mining for Biomarker Discovery (PDF, 14.3 MB)vertical_align_bottom
- Karsten Borgwardt at the ECCB workshop on "Complex Network Analysis for Precision Medicine" in The Hague (03.09.2016): DownloadNetwork Mining for Personalized Medicine (PDF, 2 MB)vertical_align_bottom
- Karsten Borgwardt at the Computational Biology (BC2) seminar at the Biozentrum at the University of Basel (25.04.2016): DownloadMachine Learning for Personalized Medicine (PDF, 7.2 MB)vertical_align_bottom (in particular slide 18ff)
- Karsten Borgwardt at the Computer Science Colloquium of the University of Basel (21.04.2016): DownloadSignificant Pattern Mining (PDF, 6.1 MB)vertical_align_bottom
- Karsten Borgwardt at TU Dortmund (12.11.2015): DownloadSignificant Pattern Mining (PDF, 6.1 MB)vertical_align_bottom
- Keynote lecture by Karsten Borgwardt at meeting of the external pageCompetence Center for Personalized Medicinecall_made of ETH Zürich & the University of Zürich at Kartause Ittingen. (02.11.2015): DownloadMachine Learning for Personalized Medicine (PDF, 7.2 MB)vertical_align_bottom (in particular slide 18ff)
- Christian Bock (alumnus)
- Karsten Borgwardt
- Thomas Gumbsch
- Anja Gumpinger (alumna)
- Max Horn (alumnus)
- Felipe Llinares López (alumnus)
- Michael Moor (alumnus)
- Laetitia Papaxanthos (alumna)
- Bastian Rieck (alumnus)
- Damian Roqueiro (alumnus)
- Dean Bodenham (alumnus)
- Xiao He (alumnus)
- Mahito Sugiyama (alumnus and collaborator)