In silico Phenotyping via Co-training

Main content

Damian Roqueiro, Menno Witteveen, Verneri Anttila, Gisela Terwindt, Arn van den Maagdenberg, Karsten Borgwardt

In silico phenotyping via co-training for improved phenotype prediction from genotype

Summary

This work provides the proof-of-principle  that co-training can successfully be used to augment training datasets for improved phenotype prediction from genotype.

Code

A beta version of the co-training pipeline can be accessed here (GZ, 13 KB). This code will soon be uploaded as a new project in sourceforge.net

Sample results

For a partition of the dataset into: set I = 10%, set II = 70% and set III = 20%, the results generated by our co-training pipeline can be found here (GZ, 137.9 MB). Some additional details about the subdirectory structure of the results are:

  • cv_set : this directory contains the 100 random permutations of the data into sets I, II and III. In each random fold, patients are assigned a value of {1, 2, 3} to indicate in which set they are placed
  • pheno_imp: contains the imputed labels in set II for all random folds. The labels are soft (not binary) becaused the Bagged predictor returns a probability of sample beloging to class 1 (migraine with aura)
  • random_forest: contains the final results of the hg classifier when applied to set III. The file roc_auc.csv has the the AUC scores for all the 100 random folds. Additionally, the files mean_tpr.csv and mean_fpr.csv contain the averaged values used to plot the ROC curves.

Reference

Keyboard navigation between tabs via Alt arrow keys as well as Home and End.

Damian Roqueiro, Menno Witteveen, Verneri Anttila, Gisela Terwindt, Arn van den Maagdenberg, Karsten Borgwardt.
In silico phenotyping via co-training for improved phenotype prediction from genotype.
ISMB 2015, Bioinformatics (2015) 31 (12): i303-i310. (Online)

@article{Roqueiro15062015,
author = {Roqueiro, Damian and Witteveen, Menno J. and Anttila, Verneri and Terwindt, Gisela M. and van den Maagdenberg, Arn M.J.M. and Borgwardt, Karsten},
title = {In silico phenotyping via co-training for improved phenotype prediction from genotype},
volume = {31},
number = {12},
pages = {i303-i310},
year = {2015},
doi = "10.1093/bioinformatics/btv254",
URL = {http://bioinformatics.oxfordjournals.org/content/31/12/i303.abstract},
eprint = {http://bioinformatics.oxfordjournals.org/content/31/12/i303.full.pdf+html},
journal = {Bioinformatics}
}

Contact for questions regarding usage of the pipeline or to report bugs.

 
 
Page URL: https://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/co-training.html
Thu Apr 27 21:15:43 CEST 2017
© 2017 Eidgenössische Technische Hochschule Zürich