### Confidence regions for simple correspondence analysis using the Cressie-Read family of divergence statistics

#### Abstract

When examining the association between symmetrically associated categorical variables, correspondence analysis provides a visual means of identifying the structure of this association. An important and sometimes overlooked feature that can help the analyst determine whether those categories that provide a statistically significant contribution to the association is the confidence region. When constructing these regions, correspondence analysis traditionally (but not always) considers Pearson’s chi-squared statistic as the core measure of association between the variables. Such a statistic is a special case of the Cressie-Read family of divergence statistics as is the log-likelihood ratio statistic, Freedman-Tukey statistic, and other such measures. Therefore, this paper will consider the construction of confidence regions in correspondence analysis where this family of divergence statistics is used as the measure of association. Doing so provides a means of simply constructing confidence regions for each category of a contingency table and allows for such regions to be constructed when log-ratio analysis (LRA) or the Hellinger distance decomposition (HDD) method is applied to the contingency table.

#### References

Ainsworth, M. D., Blehar, M. C., Waters, E. and Wall, S. (1978). Patterns of Attachment: A Psychological Study of the Strange Situation. Erlbaum, Hillsdale, NJ.

Anscombe, F. J. (1953). Discussion of “New light on the correlation coefficient and its transforms” (Hotelling, H.). Journal of the Royal Statistical Society, Series B, 15, 229 – 230.

Beh, E. J. (1997). Simple Correspondence analysis of ordinal cross-classifications using orthogonal polynomials. Biometrical Journal, 39, 589 – 613.

Beh, E. J. (2001). Confidence circles for correspondence analysis using orthogonal polynomials. Journal of Applied Mathematics and Decision Sciences, 5, 35 – 45.

Beh, E. J. (2010) Elliptical confidence regions for simple correspondence analysis. Journal of Statistical Planning and Inference, 140, 2582 – 2588.

Beh, E. J. and D’Ambra, L. (2009). Some interpretative tools for non-symmetrical correspondence analysis. Journal of Classification, 26, 55 – 76.

Beh, E. J. and Lombardo, R. (2014). Correspondence Analysis: Theory, Practice and New Strategies. Wiley: Chichester.

Beh, E. J. and Lombardo, R. (2021a) The Cressie-Read divergence statistic and digital data visualisation. ISI 63rd World Statistics Congress, The Hague, The Netherlands, July 11 – 16, 2021 (6 pages).

Beh, E. J. and Lombardo, R. (2021b). An Introduction to Correspondence Analysis. Wiley: Chichester.

Beh, E. J. and Lombardo, R. (2022). Correspondence analysis and the Cressie-Read family of divergence statistics. Working Paper Series 06-22, National Institute for Applied Statistics Research Australia (NIASRA), University of Wollongong, Australia. Available online at: https://www.uow.edu.au/niasra/publications

Beh, E. J., Lombardo, R. and Alberti, G. (2018). Correspondence analysis and the Freeman–Tukey statistic: A study of archaeological data. Computational Statistics & Data Analysis, 128, 73 – 86.

Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (1975). Discrete Multivariate Analysis. MIT Press.

Cressie, N. and Pardo, L. (2002). Phi-divergence statistic. In Encyclopedia of Environmetrics (El-Shaarawi, A.H., Piegorsch, W.W. eds), pp. 1551 – 1555. Wiley: New York.

Cressie, N. A. C. and Read, T. R. C. (1984). Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society, Series B, 46, 440 – 464.

Cressie, N. A. C. and Read, T. R. C. (1989). Cressie-Read statistic. In Encyclopedia of Statistical Sciences (Kotz, S. and Johnson, N.L. eds), pp. 37 – 39. Wiley: New York.

Cuadras, C. M. and Cuadras, D. (2006). A parametric approach to correspondence analysis. Linear Algebra and its Applications, 417, 64 – 74.

Cuadras, C. M. and Cuadras, D. (2015). A unified approach for the multivariate analysis of contingency tables. Open Journal of Statistics, 5, 223 – 232.

Cuadras, C. M., Cuadras, D. and Greenacre, M. J. (2006). A comparison of different methods for representing categorical data. Communications in Statistics – Simulation and Computation, 35, 447 – 459.

Domenges, D. and Volle, M. (1979). Analyse factorielle sphérique: une exploration. Annales de l’inséé, 35, 3 – 84.

Freeman, M. F. and Tukey, J. W. (1950). Transformations related to the angular and square root. The Annals of Mathematical Statistics, 21, 607 – 611.

George, C., Kaplan, N. and Main, M. (1985). Adult Attachment Interview. Unpublished manuscript, University of California, Berkeley.

Gower, J., Gardner-Lubbe, S. and le Roux, N. (2011). Understanding Biplots. Wiley: Chichester.

Greenacre, M. J. (2009). Power transformations in correspondence analysis. Computational Statistics and Data Analysis, 53, 3107 – 3116.

Greenacre, M. J. (2010a). Log-ratio analysis is a limiting case of correspondence analysis. Mathematical Geosciences, 42, 129 – 134.

Greenacre, M. (2010b). Biplots in Practice. Fundación, BBVA, Bilbao (2010b). Available online at: https://www.fbbva.es/wp-content/uploads/2017/05/dat/DE_2010_biplots_in_practice.pdf

Greenacre, M. (2017). Correspondence Analysis in Practice (3rd ed). Chapman & Hall/CRC Press: Barcelona.

Kroonenberg, P. M. and Lombardo, R. (1999). Nonsymmetric correspondence analysis: A tool for analysing contingency tables with a dependence structure. Multivariate Behavioral Research, 34, 367–396.

Kullback, S. (1959). Information Theory and Statistics. Wiley.

Lebart, L., Morineau, A. and Warwick, K. M. (1984). Multivariate Descriptive Statistical Analysis. Wiley: New York.

Lombardo, R. and Ringrose, T. (2012). Bootstrap confidence regions in non-symmetrical correspondence analysis. Electronic Journal of Applied Statistical Analysis, 5, 413 – 417.

Linting, M., Meulman, J., Groenen, P. and van der Kooij, A. (2007). Stability of nonlinear principal component analysis: An empirical study using the balanced bootstrap. Psychological Methods, 12, 359 – 379.

Markus, M. (1994). Bootstrap Confidence Regions in Non-Linear Multivariate Analysis. DSWO Press.

McCullagh, P. and Nelder, J. A. (1984). Generalized Linear Models (2nd ed). Chapman and Hall: London.

Neyman, J. (1940). Contribution to the theory of certain test criteria. Bulletin de L’Institut International de Statistique, 24, 44 – 86.

Neyman, J. (1949). Contributions to the theory of the 〖¬χ〗^2 test. Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, 1, 239 – 273.

Pearson, K. (1904). On the theory of contingency and its relation to association and normal correlation. Drapers Memoirs, Biometric Series, Vol 1, London.

Rao, C. R. (1995). A review of canonical coordinates and an alternative to correspondence analysis using Hellinger distance. Qüestiió, 19, 23 – 63.

Read, T. R. C. and Cressie., N. A. C. (1988). Goodness-of-Fit Statistics for Discrete Multivariate Data. Springer-Verlag: New York.

Ringrose, T. (1992). Bootstrapping and correspondence analysis in archaeology. Journal of Archaeological Science, 19, 615 – 629.

Ringrose, T. (1996). Alternative confidence regions for canonical variate analysis. Biometrika, 83, 575 – 587.

Ringrose, T. (2012). Bootstrap confidence regions for correspondence analysis. Journal of Statistical Computation and Simulation, 82, 1397 – 1413.

van IJzendoorn, M. H. (1995). Adult attachment representations, parental responsiveness, and infant attachment. A meta-analysis on the predictive validity of the Adult Attachment Interview. Psychological Bulletin, 117, 387 – 403.

Wilks, S. S. (1938). The large-sample distribution of the likelihood ratio for testing composite hypotheses. The Annals of Mathematical Statistics, 9, 60 – 62.

Full Text: pdf