Regression and Random Confounding


An ordinary least squares regression estimate for the slope, regardless of its strength, can have its sign reversed through adjustment for a random confounding vector of data.  The assumption of a rotionally invariant distribution, on the space of centered, random, confounding vectors of data, makes calculation of probabilities for these reversals possible.  Here these probabilities are shown to decrease exponentially, as the sample size increases.  This analytic result leads to some asymptotic comparison between regular sampling error and the error due to a mis-specified model.

DOI Code: 10.1285/i20705948v8n3p346

Keywords: least-squares; high-dimensional geometry; gamma function; complementary error function; model uncertainty; omitted-variable bias


Ball, K. (1997). An elementary introduction to modern convex geometry. Flavors of geometry, MSRI Publications, Volume 31.

Cervellati, C., Bonaccorsi, G., Cremonini, E., Bergamini, C.M., Patella, A., Castaldini, C., Ferrazzini, S., Capatti, A., Picarelli, V., Pansini, F.S., and Massari, L. (2010). Bone mass density selectively correlates with serum markers of oxidative damage in post-menopausal women. Clinical Chemistry and Laboratory Medicine, Volume 51, Issue 2, Pages 333-338.

Chatfield, C. (1995). Model Uncertainty, Data Mining and Statistical Inference. Journal of the Royal Statistical Society: A, 158, Part 3, pp. 419–466.

Cohen, J., Cohen, P., West, S.G., and Aiken, L.S. (2003). Applied Multiple Regression / Correlation Analysis for the Behavioral Sciences, Third Edition. Lawrence Erlbaum Associates, Publishers, Mahwah, New Jersey, London.

Federer, H. (1969). Geometric measure theory. Springer, New York.

Frank, K.A. (2000). Impact of a Confounding Variable on a Regression Coefficient. Sociological Methods & Research, Vol. 29, No. 2, 147-194.

Giles, D. (1989). Coefficient sign changes when restricting regression models under instrumental variables estimation. Oxford Bulletin of Economics and Statistics, 51, 465-467.

Gordon, R.D. (1941). Values of Mills' ratio of area to bounding ordinate and of the normal probability integral for large values of the argument. Annals of Mathematical Statistics, Vol. 12, No. 3, Sept.

Greenland, S., and Morgenstern, H. (2001). Confounding in Health Research. Annual Review of Public Health, 22:189-212.

Harman, H.H. (1976). Modern Factor Analysis, Third Edition. University of Chicago Press.

Hosman, C.A., Hansen, B.B., and Holland, P.W. (2010). The sensitivity of linear regression coefficients' confidence limits to the omission of a confounder. Annals of Applied Statistics, Vol. 4, No. 2, 849-870.

Howards, P.P., Schisterman, E.F., Poole, C., Kaufman, J.S., Weinberg, C.R. (2012). ``Toward a clearer definition of confounding'' revisited with directed acyclic graphs. American Journal of Epidemiology, 176(6):506-11.

Jungert, A., Roth, H.J., and Neuhauser-Berthold, M. (2012). Serum 25-hydroxyvitamin D3 and body composition in an elderly cohort from Germany: a cross-sectional study. Nutrition & Metabolism, 9:42.

Khoshnevisan, D. (2010). Probability. American Mathematical Society.

Knaeble, B. (2014). Certain effects of uncertain models. International Journal of Advanced Statistics and Probability, 2,2,124-134.

Leamer, E.E. (1975). A result on the sign of restricted least-squares estimates. Journal of Econometrics, 3, 387-390.

Lignell, S., Aune, M., Darnerud, P.O., Hanberg, A., Larsson, S.C., Glynn, A. (2013). Prenatal exposure to polychlorinated biphenyls and polybriminated diphenyl ethers may influence birth weight among infants in a Swedish cohort with background exposure: a cross-sectional study. Environmental Health, 12:44.

Margaritis, D. and Thrun, S. (2001). A Bayesian Multiresolution Independence Test for Continuous Variables. Proceedings of the 17th Annual Conference on Uncertainty in AI (UAI).

McAleer, M., Pagan, A., and Visco, I. (1986). A further result on the sign of restricted least-squares estimates. Journal of Econometrics, 32, 287-290.

Nelson, R.K., Horowitz, J.F., Holleman, R.G., Swartz, A.M., Strath, J.S., Kriska, A.M., and Richardson, C.R. (2013). Daily physical activity predicts degree of insulin resistance: a cross-sectional observational study using the 2003--2004 National Health and Nutrition Examination Survey. International Journal of Behavioral Nutrition and Physical Activity, 10:10.

Oksanen, E.H. (1987). On sign changes upon deletion of a variable in linear regression analysis. Oxford Bulletin of Economics and Statistics, 49, 227-229.

Pearl, J (2009a). Causal inference in statistics: An overview. Statistical Surveys, Vol. 3, p 96-146.

Pearl, J (2009b). Causality: models, reasoning and inference. Cambridge University Press.

Puri, M.L. and Sen, P.K. (1971). Nonparametric Methods in Multivariate Analysis. Wiley, New York.

Rosenbaum, P.R., and Rubin, D.B. (1983b). Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. Journal of the Royal Statistical Society, Series B, Vol. 45, No. 2, p 212-218.

Seber, George A.F. and Lee, Alan J. (2003). Linear Regression Analysis. John Wiley & Sons, Inc., New Jersey.

Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics.

Visco, I. (1988). Again on sign changes upon deletion of a variable from a linear regression. Oxford Bulletin of Economics and Statistics, 50, 225-227.

Wilks, S.S. (1935). On the independence of $k$ sets of normally distributed statistical variables. Econometrica, 3, 309-326.

Full Text: pdf

Creative Commons License
This work is licensed under a Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia License.