Outlier detection through mixtures with an improper component
Abstract
References
Aitkin, M. and Tunnicliffe-Wilson, G. (1980), ’Mixture Models, Outliers, and the EM Algorithm’, Technometrics, 22, No. 3, 325-331.
Banfield, J.D. and Raftery, A.E. (1993) ”Model-based Gaussian and non-Gaussian clus- tering” Biometrics, 49, 803-821.
Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition, 30(7), 1145-1159.
Burge, P. and Shawe-Taylor, J. (1997). Detecting cellular fraud using adaptive proto- types. In Proc. of AI Approaches to Fraud Detection and Risk Management, pp. 913.
Chandola, V., Banerjee, A. and Kumar, V. (2009) ”Anomaly detection: a survey”. ACM Comput Surv, 41 No. 3, 15:1– 15:58.
Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977), ‘Maximum likelihood for incom- plete data via EM algorithm’, Journal of the Royal Statistical Society, Series B, 39, 1-38.
Ernst, M., & Haesbroeck, G. (2017). Comparison of local outlier detection techniques in spatial multivariate data. Data mining and knowledge discovery, 31(2), 371-399.
Flury, B., Riedwyl, H. (1988). Multivariate Statistics. A Practical Approach. Chapman and Hall, London.
Fraley, C. and Raftery, A. E . (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis.The Computer J. 41: 578-588.
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986) Robust Statistics. The Approach Based on Influence Functions. John Wiley and Sons, New York.
Hennig, C. (2004). Breakdown point for maximum likelihood estimators of location-scale mixtures. Ann. Statist. 32: 1313-1340.
Huber, P. J. (1981) Robust Statistics. John Wiley and Sons, New York.
Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A. (2004). kernlab-an S4 package for kernel methods in R. J. Stat Softw 11(9):1–20.
Kutsuna, T., & Yamamoto, A. (2017). Outlier detection using binary decision diagrams. Data mining and knowledge discovery, 31(2), 548-572.
Limas, M. C., Mer, J. B. O., de Pisn Ascacibar, F. J. M., & Gonzlez, E. P. V. (2004). Outlier detection and data cleaning in multivariate non-normal samples: the PAELLA algorithm. Data Mining and Knowledge Discovery, 9(2), 171-187.
Longford, N.T. and D’Urso, P. (2011) ”Mixture models with an improper component”, J. Appl. Stat. 38, 2511–2521.
Longford, N. T. (2013). Searching for contaminants. Journal of Applied Statistics, 40(9), 2041-2055.
Liu, F.T., Ting, K.M., Zhou, Z.H. (2008). Isolation Forest, IEEE International Confer- ence on Data Mining
(ICDM 08). https://sourceforge.net/projects/iforest/
McLachlan, G.J and Peel, D. (2000), Finite Mixture Models. John Wiley and Sons, New York.
Monetti, A., Versini, G., Dalpiaz, G. and Raniero, F. (1996) ”Sugar Adulterations Con- trol in Concentrated Rectified Grape Musts by Finite Mixture Distribution Analysis of the myo- and scyllo-Inositol Content and D/H Methyl Ratio of Fermentative Ethanol”, J. Agric. Food Chem., 44, 2194-2201.
R Core Team (2018). R: A language and environment for statistical computing. R Foun- dation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Scrucca L., Fop M., Murphy T. B. and Raftery A. E. (2017) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. The R Journal 8(1), pp. 205-233
Sing, T., Sander, O., Beerenwinkel, N., Lengauer, T. (2005). ROCR: visualizing classifier performance in R. Bioinformatics 21(20):3940–3941.
Torgo, L. (2010). Data Mining with R, learning with case studies Chapman and Hall/CRC. URL: http://www.dcc.fc.up.pt/ ltorgo/DataMiningWithR
Yamanishi, K., Takeuchi, J. I., Williams, G., & Milne, P. (2004). On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min- ing and Knowledge Discovery, 8(3), 275-300.
Full Text: pdf