abstract
- © 2015, Springer Science+Business Media New York.The accuracy and reliability of an anomaly-based network intrusion detection system are dependent on the quality of data used to build a normal behavior profile. However, obtaining these datasets is not trivial due to privacy, obsolescence, and suitability issues. This paper presents an approach to traffic trace sanitization based on the identification of anomalous patterns in a three-dimensional entropy space of the flow traffic data captured from a campus network. Anomaly-free datasets are generated by filtering out attacks and traffic pieces that modify the typical position of centroids in the entropy space. Our analyses were performed on real life traffic traces and show that the sanitized datasets have homogeneity and consistency in terms of cluster centroids and probability distributions of the PCA-transformed entropy space.