Clustering dichotomously scored items through functional k-means algorithm


In the educational field, it is common to analyze the probability of a correct response to a test item as a continuous function of the item parameters and the subject ability. This relation is given by the item response function. Since test data are expressed as curves, they can be analyzed through the functional data analysis approach. Indeed, several researchers suggest to estimate the shape of the item response function with a non-parametric approach in order to catch unusual or unforeseen features in the curve. On the contrary, item response theory models assume a specic parametric functional form for the item response function. In this paper, we propose an alternative method that combines the parametric specication of the common item response theory with the functional data analysis approach. In particular, we aim to classify the items through the functional k-means algorithm. The key idea is to transform the function space of the items in a convex space which guarantees desirable properties. Specically, we prove that, exploiting the convexity property, the functional centroids belong to the same function space as the item response functions. The applicability of our proposal in the educational filed, is demonstrated through a real data set concerning test data of the Italian Olympics of Statistics.

DOI Code: 10.1285/i20705948v9n2p433


Aggen, S., Neale, M., and Kendler, K. (2005). DSM criteria for major depression: evaluating symptom patterns using latent-trait item response models. psychological medicine, 35:475-478.

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. In Lord, F. and Novick, M., editors, Statistical Theories of Mental Test Scores, pages 397-479. MA: Addison-Wesle.

Bradlow, E. and Zaslavsky, A. (1999). A hierarchical latent variable model for ordinal data from a customer satisfaction survey with `no answer' responses. Jornal of the American Statistical Association, 94:43-52.

Braeken, J. (2008). Modeling Residual Dependencies in Latent Variable Models with Copulas. Katholieke Universiteit, Leuven.

Ceccatelli, C., Di Battista, T., Fortuna, F., and Maturo, F. (2013). Best practices to improve the learning of statistics: the case of the National Olympics of Statistic in Italy. Procedia - Social and Behavioral Sciences, 93:2194-2199.

De Sanctis, A. and Di Battista, T. (2012). Functional analysis for parametric families of functional data. International Journal of Bifurcation and Chaos, 22 (9):1250226-1-1250226-6.

Di Battista, T., De Sanctis, A., and Fortuna, F. (2016). Clustering functional data on convex function spaces. In Di Battista, T., Moreno, E., and Racugno, W., editors, Selected Papers of the 47th Scientic meeting of the Italian Statistical Society, pages

{109. Springer, In press.

Di Battista, T. and Fortuna, F. (2013). Assessing biodiversity prole through FDA. Statistica, 1:69-85.

Ferraty, F. and Vieu, P. (2006). Nonparametric functional data analysis. Springer, New York.

Forgy, E. (1965). Cluster analysis of multivariate data: eciency vs interpretability of classications. Biometrics, 21:768-769.

Grayson, D. (1988). Two group classication in latent trait theory: scores with monotone likelihood ratio. Psychometrika, 33:383-392.

Huynh, H. (1994). A new proof for monotone likelihood ratio for the sum of independent bernoulli random variables. Psychometrika, 59:77-79.

Jain, A. and Dubes, R. (1988). Algorithms for Clustering Data. Englewood Clis, New York: Prentice Hall.

Lord, F. and Novick, M. (1968). Statistical Theories of Mental Test Scores. Addison-Wesley, Reading, MA.

MacQueen, J. (1967). Some methods for classication and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press.

Matthew, S. (2007). Modeling dichotomous item responses with free-knot splines. Computational Statistics & Data Analysis, 51:4178-4192.

Ramsay, J. (1991). Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56:611-630.

Ramsay, J. (1997). A functional approach to modeling test data. In van der Linden, W. and Hambleton, R., editors, Handbook of modern Item Response Theory, pages 381-394. Springer, New York.

Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd edn. Springer, New York.

Rasch, G. (1960). Probabilistic models for some intelligence and achievement tests. Danish Institute for educational research, Copenhagen.

Reckase, M. D. (2009). Multidimensional item response theory. Springer, New York.

Rizopoulos, D. (2006). ltm: An r package for latent variable modeling and item response theory analysis. Journal of Statistical Software, 17 (5):1-25.

Rossi, N., Wang, X., and Ramsay, J. (2002). Nonparametric item response function estimates with the em algorithm. Journal of Educational and Behavioral Statistics, 27:291-317.

Sangalli, L., Secchi, P., Vantini, S., and Vitelli, V. (2010a). Functional clustering and alignment methods with applications. Communications in Applied and Industrial Mathematics, 1:205-224.

Sangalli, L., Secchi, P., Vantini, S., and Vitelli, V. (2010b). k-mean alignment for curve clustering. Computational Statistics & Data Analysis, 54:1219-1233.

Sharp, C., Goodyer, I., and Croudace, T. (2006). The short mood and feelings questionnaire SMFQ: a unidimensional item response theory and categorical data factor analysis of self-report ratings from a community sample of 7- through 11- year-old children. Journal of Abnormal Child Psychology, 34:379-391.

Tan, K. and Witten, D. (2015). Statistical properties of convex clustering. Electronic Journal of Statistics, 9:2324-2347.

Tarpey, T. (2007). Linear transformations and the k-means clustering algorithm: applications to clustering curves. The American Statistician, 61(1):34-40.

Valentini, P., Di Battista, T., and Gattone, S. (2011). Heterogeneneity measures in customer satisfaction analysis. Jornal of classications, 28:38-52.

Weiss, D. (1995). Improving individual dierences measurement with item response theory and computerized adaptive testing. In Lubinski, D. and Dawis, R., editors, Assessing individual dierences in human behavior: New concepts, methods, and nd-

ings, pages 19{79. Davies-Black Publishing, Palo Alto, CA.

Full Text: pdf

Creative Commons License
This work is licensed under a Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia License.