Robust feature selection and robust PCA for internet traffic anomaly detection
Pascoal, C.; Oliveira, M. Rosário; Valadas, Rui; Filzmoser, P.; Salvador, P.; Pacheco, António
Proceedings of IEEE, INFOCOM , 25-30 March, Orlando, FL (2012), 1755–1763
http://dx.doi.org/10.1109/INFCOM.2012.6195548
Robust statistics is a branch of statistics which includes statistical methods capable of dealing adequately with the presence of outliers. In this paper, we propose an anomaly detection method that combines a feature selection algorithm and an outlier detection method, which makes extensive use of robust statistics. Feature selection is based on a mutual information metric for which we have developed a robust estimator; it also includes a novel and automatic procedure for determining the number of relevant features. Outlier detection is based on robust Principal Component Analysis (PCA) which, opposite to classical PCA, is not sensitive to outliers and precludes the necessity of training using a reliably labeled dataset, a strong advantage from the operational point of view. To evaluate our method we designed a network scenario capable of producing a perfect ground-truth under real (but controlled) traffic conditions. Results show the significant improvements of our method over the corresponding classical ones. Moreover, despite being a largely overlooked issue in the context of anomaly detection, feature selection is found to be an important preprocessing step, allowing adaption to different network conditions and inducing significant performance gains
|