Examination of Entropy balancing technique for estimating some standard measures of treatment effects: A simulation study


In observational studies, propensity score weighting methods are regarded as the conventional standard for estimating the effects of treatments on outcomes. We introduce entropy balancing, which despite its excellent conceptual properties, has been under-utilized in the applied studies. Using an extensive series of Monte Carlo simulations, we evaluated the performance of entropy balancing, in estimating difference in means, marginal odds ratios, rate ratios and hazard ratios. The performance of entropy balancing was relatively compared with that of inverse probability of treatment weighting using the propensity score. We found that entropy balancing outperformed the IPW method in estimating difference in means, marginal odds ratios, and hazard ratios, but when estimating marginal rate ratios, IPW performed better. Entropy balancing produced more biased estimates in many cases. However, the entropy balancing algorithm is capable of controlling bias by loosening the tightening of the pre-specified tolerance on covariate balance. We report findings as to when one technique is better than the other with no proclamation on whether one method is in every case superior to the other. Entropy balancing merits more widespread adoption in applied studies.

DOI Code: 10.1285/i20705948v12n2p491

Keywords: Entropy balancing; Monte Carlo simulation; Observational studies; Propensity score weighting; Treatment effect; odds ratios; hazard ratios; rate ratios


Austin PC. A comparison of 12 algorithms for matching on the propensity score. Statistics in medicine. 2014;33(6):1057-69.

Dehejia RH, Wahba S. Propensity score-matching methods for nonexperimental causal studies. Review of Economics and statistics. 2002;84(1):151-61.

Guo S, Barth R, Gibbons C. Propensity score matching strategies for evaluating substance abuse services for child welfare clients. Children and Youth Services Review. 2006;28:357–83.

Guo S, Fraser MW. Propensity score analysis; Statistical methods and applications: SAGE Publications; 2010.

Hirshberg DA, Zubizarreta JR. On Two Approaches to Weighting in Causal Inference. Epidemiology. 2017;28(6):812-6.

Hirano K, Imbens GW. Estimation of causal effects using propensity score weighting: An application to data on right heart catheterization. Health Services and Outcomes research methodology. 2001;2(3-4):259-78.

Austin PC, Stuart EA. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in medicine. 2015;34(28):3661-79.

Adhikary SD, Liu W-M, Memtsoudis SG, Davis III CM, Liu J. Body mass index more than 45 kg/m2 as a cutoff point is associated with dramatically increased postoperative complications in total knee arthroplasty and total hip arthroplasty. The Journal of arthroplasty. 2016;31(4):749-53.

Brettschneider C, Bleibler F, Hiller TS, Konnopka A, Breitbart J, Margraf J, et al. Excess Costs of Panic Disorder With or Without Agoraphobia in Germany - The application of Entropy Balancing to Multiple Imputed Datasets. Journal of Mental Health Policy and Economics. 2017;20:S3-S.

Grupp H, Kaufmann C, König H-H, Bleibler F, Wild B, Szecsenyi J, et al. Excess costs from functional somatic syndromes in Germany—An analysis using entropy balancing. Journal of psychosomatic research. 2017;97:52-7.

Mattke S, Han D, Wilks A, Sloss E. Medicare home visit program associated with fewer hospital and nursing home admissions, increased office visits. Health Affairs. 2015;34(12):2138-46.

Pearson JL, Stanton CA, Cha S, Niaura RS, Luta G, Graham AL. E-cigarettes and smoking cessation: insights and cautions from a secondary analysis of data from a study of online treatment-seeking smokers. Nicotine & Tobacco Research. 2014;17(10):1219-27.

Parish WJ, Keyes V, Beadles C, Kandilov A. Using entropy balancing to strengthen an observational cohort study design: lessons learned from an evaluation of a complex multi-state federal demonstration. Health Services and Outcomes Research Methodology. 2018;18(1):17-46.

Rosenbaum PR. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41-55.

Imbens GW. Nonparametric estimation of average treatment effects under exogeneity: A review. Review of Economics and statistics. 2004;86(1):4-29.

Lee BK, Lessler J, Stuart EA. Improving propensity score weighting using machine learning. Statistics in Medicine. 2010;29:337-46.

Setoguchi S, Schneeweiss S, Brookhart MA, Glynn RJ, Cook EF. Evaluating uses of data mining techniques in propensity score estimation: A simulation study. Pharmacoepidemiology and Drug Safety. 2008;17:546-55.

Amusa LB, Zewotir T, North D. Evaluation of Subset Matching Methods: Evidence from a Monte Carlo Simulation Study. American Journal of Applied Sciences. 2019;16(3):92-100.

Amusa LB, Zewotir T, North D. A weighted covariate balancing method for estimating causal effects in case-control studies. Modern applied science. 2019;13(4):40-50.

Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Statistics in medicine. 2005;24(11):1713-23.

Austin PC, Grootendorst P, Normand SLT, Anderson GM. Conditioning on the propensity score can result in biased estimation of common measures of treatment effect: a Monte Carlo study. Statistics in medicine. 2007;26(4):754-68.

Austin PC, Small DS. The use of bootstrapping when using propensity‐score matching without replacement: a simulation study. Statistics in medicine. 2014;33(24):4306-19.

Austin PC. The performance of different propensity score methods for estimating marginal hazard ratios. Statistics in medicine. 2013;32(16):2837-49.

Austin PC, Stuart EA. Estimating the effect of treatment on binary outcomes using full matching on the propensity score. Statistical methods in medical research. 2017;26(6):2505-25.

Joffe MM, Ten Have TR, Feldman HI, Kimmel SE. Model selection, confounder control, and marginal structural models: review and new applications. The American Statistician. 2004;58(4):272-9.

Hainmueller J. ebal: Entropy reweighting to create balanced samples. R package version 0.1-6. 2014.

Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Statistics in medicine. 2006;25(24):4279-92.

Harvey RA, Hayden JD, Kamble PS, Bouchard JR, Huang JC. A comparison of entropy balance and probability weighting methods to generalize observational cohorts to a population: a simulation and empirical example. Pharmacoepidemiology and Drug Safety. 2017;26(4):368-77.

Austin PC. The performance of different propensity score methods for estimating marginal odd ratios. Stat Med. 2007;26:3078-94.

Gail MH, Wieand S, Piantadosi S. Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika. 1984;71(3):431-44.

Greenland S. Interpretation and choice of effect measures in epidemiologic analyses. American journal of epidemiology. 1987;125(5):761-8.

Newcombe RG. A deficiency of the odds ratio as a measure of effect size. Statistics in Medicine. 2006;25(24):4235-40.

Full Text: pdf

Creative Commons License
This work is licensed under a Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia License.