A Decomposition of the Outlier Detection Problem into a Set of Supervised Learning Problems
Outlier detection methods automatically identify instances that deviate from themajority of the data. In this paper, we propose a novel approach for unsupervised out-lier detection, which re-formulates the outlier detection problem in numerical data as aset of supervised regression learning problems. For each attribute, we learn a predictivemodel which predicts the values of that attribute from the values of all other attributes, andcompute the deviations between the predictions and the actual values. From those devi-ations, we derive both a weight for each attribute, and a final outlier score using thoseweights. The weights help separating the relevant attributes from the irrelevant ones, andthus make the approach well suitable for discovering outliers otherwise masked in high-dimensional data. An empirical evaluation shows that our approach outperforms existingalgorithms, and is particularly robust in datasets with many irrelevant attributes. Further-more, we show that if a symbolic machine learning method is used to solve the individuallearning problems, the approach is also capable of generating concise explanations for thedetected outliers.