Rapid Outlier Detection
Fast computation of distance-based outlierness scores via sampling
Summary
An efficient algorithm for outlier detection, which performs sampling once and measures outlierness of each data point by the distance from it to the nearest neighbor in the sample set. This algorithm has the following advantages:
- Scalable; the time complexity is linear in the number of data points,
- Effective; it is empirically shown to be the most effective on average among existing distance-based outlier detection methods, and
- Easy to use; you only need to input the number of samples, and small sample size (default value is 20) is shown to be a good choice.
Code
C implementation: Downloadcode.zip (ZIP, 421 KB)vertical_align_bottom
R package: Downloadspoutlier.zip (ZIP, 6 KB)vertical_align_bottom
Also available at external pageGitHubcall_made
Further information and publication
Please see the following paper for detailed information and refer it in your published research.
Rapid Distance-Based Outlier Detection via Sampling
Mahito Sugiyama and Karsten Borgwardt
Advances in Neural Information Processing Systems 26 (NIPS 2013), 467-475
external pageOnlinecall_made | ETH Research Collection | Project page including code