CODES 2000 User Forum -- Data Network Note #15

Effect of Helmet Use on Inpatient Charges - Phase II 

Applies to: CODES Data Network.
Last updated: Wednesday May 15, 2002.

SUMMARY

CODES Data Network participants reported the effect of helmet use on hospital inpatient charges for motorcycle riders injured in crashes and discharged alive.  They estimated the helmet use effect following two different methodologies in order to compare the results.  The first methodology was a traditional approach.  They found a set of high probability links between police reported crash records and hospital discharge records using a Fellegi and Sunter probabilistic record linkage model as implemented in CODES 2000 software from Strategic Matching.  Then they conducted a linear regression analysis on those high probability linked record pairs that had complete values for all regression variables using either SAS/STAT REG software from the SAS Institute or Excel Data Analysis software from Microsoft.  The major disadvantage of this first methodology is that the population analyzed may not represent the population of interest unless the percentage of missing links and the percentage of missing values are low, say less than 10%.  All Data Network participants reported rates of missing links or missing values that were much higher than 10%.

In order to better describe the population of interest, the second methodology was a two-stage Bayesian multiple imputation approach similar to those described by Rubin, Schafer, and others.  Participants created multiple complete sets of links between crash records and discharge records using an extended Fellegi and Sunter model and Bayesian multiple imputation of missing links as implemented in CODES 2000.  Then, for each imputed set of links, they created multiple complete datasets using Markov Chain Monte Carlo multiple imputation for missing values as implemented in either SAS/STAT MI software or Schafer's NORM software.  Next, they conducted linear regression analysis for each complete dataset using either REG or Data Analysis.  Finally, they combined parameter estimates derived from each imputation into a single estimate for the population of interest using Rubin's algorithms as implemented in either SAS/STAT MIANALYZE or NORM.

Here we tabulate and compare results reported by CODES Data Network participants.  In addition, we combine the multiple imputation estimates of the effect of helmet use in a meta-analysis and tabulate the results.  The meta-analysis follows the Bayesian approach presented by Gelman et al. for normally distributed effects.  Finally, we discuss apparent strengths and weaknesses of the imputation methodology.

SAS and SAS/STAT are trademarks of SAS Institute, Inc.

RESULTS

  1. Traditional High-Probability Linkage and Complete Case Regression

Table 1 - High Probability Linkage Results

State Crash Records Inpatient Records Est. Total Links Act. Links > 0.9 Act. % Est.
DE 81,928 10,419 4,000 1,646 41%
KY 307,746 39,278 4,500 2,537 56%
MD 224,111 626,955 8,100 3,749 46%
ME 99,712 10,878 1,812 1,006 56%
MN 254,024 39,005 5,000 2,731 55%
NE 83,525 24,907 1,340 940 70%
NH 56,689 117,394 1,200 527 44%
NV 106,803 12,844   1,034  
OK 180,253 355,918 2,500 1,737 69%
PA 2,533,317 114,979 112,500 66,515 59%
SC 270,463 32,530 6,500    
SD 71,080 6,409 700 476 68%
UT 428,165 77,570 7,186 4,380 61%
WI 357,058 621,236 5,400 3,900 72%
 
Table 2 - Complete Case Regression Results
State MC Links > 0.9 Complete Data Charges Intercept Helmet Effect Std. Error
DE 66 44 26,705 -2,952 11,223
KY 110 63 26,296 -8,464 11,184
MD 229 195 17,083 -236 4,859
ME 52 52 23,568 -8,016 7,353
MN 164 135 26,204 -4,096 5,676
NE 38 21 6,540 18,632 20,345
NH 56 19 21,323 -8,746 9,107
NV 51 47 51,926 -30,878  
OK 108 98 29,733 -1,406 1,213
PA 4,014 2,664 36,629 -5,128 5,649
SC 11 10 6,808 -3,300 3,919
SD 65 65 21,284 12,242 8,921
UT 338 189 17,953 -6,394 3,277
WI 446 316 25,953 -7,472 4,383

...

  1. Multiple Imputation Results
Table 3 - Complete Linkage Imputation Results
State Crash Records Inpatient Records Est. Total Links Act. Links Imp 1 Act. % Est. Avg. Prob. Imp 1
DE 81,928 10,419 2,000 1,828 91% 0.91
KY 307,746 39,278 4,500 4,978 111% 0.66
MD 224,111 15,419 8,200 7,670 94% 0.71
ME 99,712 10,878 1,812 1,654 91% 0.78
MN 216,990 39,005 7,100 7,478 105% 0.75
NE 83,527 24,907 1,340 1,278 95% 0.84
NH 56,689 12,877 1,200 1,296 108% 0.70
OK 180,253 355,918 4,163 4,027 97% 0.88
PA 2,533,317 114,979 95,000 87,190 92% 0.73
SC 270,298 32,606 4,100 3,201 78% 0.96
UT 428,165 77,570 7,186 7,612 106% 0.68
 
Table 4 - Value Imputation and Regression Results
State MC Cases (Imp. 1) Charges Intercept Age Effect Sex Effect Helmet Effect Helmet SE
DE 66 49,302 -484 -4,989 -8,142 9,931
KY 130 16,225 172 373 -5,274 8,329
MD 224 9,927 79 3,544 124 4,858
ME 51 22,049     -7,307 7,187
MN 207 23,505 -21 -4,394 2,089 4,293
NE 35 37,536 -585 -18,589 24,532 18,786
NH 66 4,977 163 4,162 -2,482 5,418
OK 122 9,271 -3 -2,490 173 1,600
PA 2,821 22,922 -6 7,936 -762 6,844
SC 256 33,031 -134 -1,767 -2,346 6,347
UT 408 10,128 161 2,847 -5,153 2,583

...

  1. Meta-Analysis Results

Following Gelman, Carlin, Stern, and Rubin (1995), Section 5.4, our meta-analysis constructs:

A simple hierarchical model based on the normal distribution, in which observed data are normally distributed with a different mean for each [state], with known observation variance, and a normal population distribution for the [state] means.  This model is sometimes termed the one-way normal random-effects model with known data variance and is widely applicable, being an important special case of the hierarchical normal linear model...

For this model, computation of the posterior distribution of [helmet effects] is most conveniently performed via simulation [using 10,000 random draws], following the factorization [given for the joint posterior distribution of model parameters].  The first step, simulating [the population standard deviation] tau, is easily performed numerically using the [given] inverse cdf method on a grid of [100] uniformly spaced values of tau, with [the given posterior distribution of tau].  The second and third steps, simulating [the population mean] mu and then [the vector of state helmet effects] theta, can both be done easily by sampling from [given] normal distributions.

For analytical details, see Data Network Note #12 - Methodology for Meta-Analysis of Helmet Use Effects

Table 5 - Estimated Quantiles for the Normal Population Distribution of Helmet Use Effects
Param 0.05 0.25 0.50 0.75 0.95
Mean -4,409 -2,613 -1,563 -515 1,014
StdDev 200 969 1,904 3,198 5,574

Here, "Population" means all reporting Data Network states.

Figure 1 - Histogram of Simulated Values for the Mean of the Population Helmet Use Effect

Figure 2 - Histogram of Simulated Values for the Standard Deviation of the Population Helmet Use Effect

Table 6 - Estimated Quantiles for State Helmet Use Effects
State 0.05 0.25 0.50 0.75 0.95
DE -7,332 -3,436 -1,789 -259 2,238
KY -6,961 -3,351 -1,717 -242 2,403
MD -5,293 -2,714 -1,305 112 2,883
ME -7,484 -3,578 -1,928 -528 1,840
MN -4,374 -2,240 -925 572 3,635
NE -5,812 -2,755 -1,220 427 4,416
NH -6,180 -3,209 -1,674 -284 2,195
OK -2,885 -1,564 -643 342 1,836
PA -5,900 -2,971 -1,440 8 2,888
SC -6,326 -3,093 -1,566 -151 2,556
UT -6,813 -4,258 -2,649 -1,349 268
 

Figure 3 - Histogram of Simulated Values for DE Helmet Use Effect

Figure 4 - Histogram of Simulated Values for KY Helmet Use Effect

Figure 5 - Histogram of Simulated Values for MD Helmet Use Effect

Figure 6 - Histogram of Simulated Values for ME Helmet Use Effect

Figure 7 - Histogram of Simulated Values for MN Helmet Use Effect

Figure 8 - Histogram of Simulated Values for NE Helmet Use Effect

Figure 9 - Histogram of Simulated Values for NH Helmet Use Effect

Figure 10 - Histogram of Simulated Values for OK Helmet Use Effect

Figure 11 - Histogram of Simulated Values for PA Helmet Use Effect

Figure 12 - Histogram of Simulated Values for SC Helmet Use Effect

Figure 13 - Histogram of Simulated Values for UT Helmet Use Effect

Following Gelman et al., Section 8.5, we checked the fit of the statistical model estimated by our meta-analysis by simulating 10,000 repetitions of the reported data.

Figure 14 - Histogram of Simulated Values for Minimum Reported Helmet Use Effect

Figure 15 - Histogram of Simulated Values for Maximum Reported Helmet Use Effect

Figure 16 - Histogram of Simulated Values for Mean Reported Helmet Use Effect

DISCUSSION

  1. For most states, estimates of helmet use effects obtained by analyzing high-probability complete cases were not the same as estimates obtained by multiple imputation of missing links and missing values.  This suggests that high-probability complete cases may not be representative of the total study populations.
Table 7 - Comparison of Estimated State Helmet Use Effects
State Complete Cases Complete Effect  Imputed Cases Imputed Effect
DE 44 -2,952 66 -8,142
KY 63 -8,464 130 -5,274
MD 195 -236 224 124
ME 52 -8,016 51 -7,307
MN 135 -4,096 207 2,089
NE 21 18,632 35 24,532
NH 19 -8,746 66 -2,482
OK 98 -1,406 122 173
PA 2,664 -5,128 2,821 -762
SC 10 -3,300 256 -2,346
UT 189 -6,394 408 -5,513
 
  1. Meta-analysis results suggest that helmet use is protective, but the effect is not significant at a 90% confidence level.  That is, helmet users incur lower inpatient charges, on average, but there is wide variation from case to case and state to state.  The 50%-tile estimate for the population-wide helmet use effect is -$1,563.  The estimated symmetric 90% confidence interval for the population-wide helmet use effect is -$4,409 to $1,104.

The 50%-tile estimate for the state-to-state standard deviation in mean helmet use effect is $1,904.  Consequently, it is likely that the state-to-state variation in true mean helmet use effect is less than the apparent variation based on a single report from each state (-$8,142 to $24,532).  The range of 50%-tile estimates for the state effects is -$2,649 to -$643.  No state shows statistically significant helmet protection with 90% confidence intervals completely below zero.

Based on 10,000 simulated replications of state reports, the statistical model estimated by our meta-analysis fits the data fairly well.  One test statistic, the mean reported state helmet use effect, had an actual value near p=0.4 in the simulated distribution.  Two other test statistics, minimum and maximum reported state helmet use effects, had actual values near p=0.2 and p=0.9 in the simulated distributions, respectively.

  1. Most states found significantly more Crash to Inpatient links by using the imputation methodology and were exceed 90% of their estimated total links.  See Data Network Note # 11 - Multiple Imputation and One-to-One Links (Revised).  Linked datasets must be nearly complete (over 90%) for accurate analysis of study populations.
Table 8 - Comparison of Estimated Versus Actual Link Counts
State Crash Records Estimated Total Links % of Crash Actual Imputed Links % of Est.
DE 81,928 2,000 2.4 1,828 91
KY 307,746 4,500 1.5 4,978 111
MD 224,111 8,200 3.7 7,670 94
ME 99,712 1,812 1.8 1,654 91
MN 216,990 7,100 3.3 7,478 105
NE 83,525 1,340 1.6 1,278 95
NH 56,689 1,200 2.1 1,296 108
OK 180,253 4,163 2.3 4,027 97
PA 2,533,317 95,000 3.8 87,190 92
SC 270,298 4,100 1.5 3,201 78
UT 428,165 7,186 1.7 7,612 106

Eight states reported estimated total links as a percent of crash records in a fairly narrow range between 1.5% and 2.4%.  Maryland, Minnesota, and Pennsylvania were outliers in the range 3.3% to 3.7%.

  1. As expected, reported hospital inpatient charges are highly skewed for all states.  Consequently, a logarithmic transformation would be appropriate in the Phase II regression analysis.
Table 9 - Hospital Inpatient Charges for Motorcycle Riders
State Min. Charges Mean Charges Max. Charges
DE 775 21,029 203,472
KY 43 21,728 297,296
MD 794 15,599 155,344
ME 1,016 20,256 111,976
MN 1,724 23,149 262,605
NE 1,733 23,917 140,209
NH 532 14,748 64,089
OK 527 23,454 442,378
PA 16 32,198 1,150,294
SC 1,740 26,589 439,356
SD 1,489 24,214 161,989
UT 432 20,429 201,680
WI 807 22,237 284,821

Mean charges for 9 states fall in a fairly narrow range from $20,256 to $26,589.  Kentucky, Maryland, New Hampshire, and Pennsylvania are outliers.

  1. Missing data values contributed to uncertainty about the true effect of helmet use on inpatient charges.  States reported various levels of missing helmet use data ranging from 0% to 59%.
Table 10 - Missing Helmet Use Data in Linked Datasets
State Motorcycle Cases (Imp. 1) Helmet Use Missing % Missing
DE 66 20 30%
KY 130 14 11%
MD 224 30 13%
ME 51 0 0%
MN 207 22 11%
NE 35 14 40%
NH 66 39 59%
OK 122 12 10%
PA 2,821 1,037 37%
SC 256 9 4%
UT 408 181 44%

Maine had complete helmet use reporting.  Only NH and UT data had over 40% missing helmet use values.  Consequently, multiple imputation and simulation algorithms are likely to be perform well for most state analyses.  Schaffer notes on page 137, that if "rates of missing information are moderate, say 40% or less, we may expect the simulations to proceed without much difficulty."

  1. Most states were able to use CODES 2000 to construct an adequate linkage probability model for the Phase II analysis.  Arizona, Nevada, and Wisconsin reported difficulties completing their linkage imputation processes with CODES 2000.  Arizona's imputed one-to-one links included several times as many matched pairs as their estimated total.  Most of the pairs were tabulated with high probabilities and assigned to the same set, even after installing the latest version of CODES 2000.  In addition, doing linkage imputations in AZ sometimes caused their PC to crash.

South Carolina successfully implemented their linkage imputation as a one-match process with CODES 2000.  They initially tried the same two-match process used with earlier linkage software:  First, link only crash records with names to all hospital records, and second, link only crash records without names to unlinked hospital records.  However, combining and analyzing these two separate linkage imputations added substantial complexity.

Figure 21 - Link Specifications Report for DE

Figure 22 - Link Specifications Report for KY

Figure 23 - Link Specifications Report for MD

Figure 24 - Link Specifications Report for ME

Figure 25 - Link Specifications Report for MN

Figure 26 - Link Specifications Report for NE

Figure 27 - Link Specifications Report for NH

Figure 28 - Link Specifications Report for OK

Figure 29 - Link Specifications Report for PA

Figure 30 - Link Specifications Report for UT

  1. Maine and Rhode Island reported that using the SAS MI procedure for value imputation produced errors when there were no missing values.  South Dakota reported that examples of required text file formats would be useful in the instructions for Schafer's NORM procedure.  No other states reported value imputation issues with either SAS MI or NORM.

Rhode Island and South Carolina reported errors when using DDE and ODBC procedures for directly exchanging data between Microsoft Access and SAS.  They had to resort to creating ancillary files in order to transfer data between these systems.

 
 
© Copyright 2000 - 2008 Strategic Matching, Inc. All rights reserved. Microsoft, Windows, and Access are trademarks of Microsoft Corporation. Last modified: Monday January 28, 2008.