CODES 2000 User Forum -- Data Network Note #15
Effect of Helmet Use on Inpatient Charges - Phase II

Applies to: CODES Data Network.
Last updated: Wednesday May 15, 2002.

SUMMARY
CODES Data Network participants reported the effect
of helmet use on hospital inpatient charges for motorcycle riders injured in
crashes and discharged alive. They estimated the helmet use effect
following two different methodologies in order to compare the results. The
first methodology was a traditional approach. They found a set of high
probability links between police reported crash records and hospital discharge
records using a Fellegi and Sunter probabilistic record linkage model as
implemented in CODES 2000 software from Strategic Matching. Then they
conducted a linear regression analysis on those high probability linked record
pairs that had complete values for all regression variables using either
SAS/STAT REG software from the SAS Institute or Excel Data Analysis software
from Microsoft. The major disadvantage of this first methodology is that
the population analyzed may not represent the population of interest unless the
percentage of missing links and the percentage of missing values are low,
say less than 10%. All Data Network participants reported rates of missing
links or missing values that were much higher than 10%.
In order to better describe the population of interest, the second
methodology was a two-stage Bayesian multiple imputation approach similar to
those described by Rubin, Schafer, and others. Participants created
multiple complete sets of links between crash records and discharge records
using an extended Fellegi and Sunter model and Bayesian multiple imputation of
missing links as implemented in CODES 2000. Then, for each imputed set of
links, they created multiple complete datasets using Markov Chain Monte Carlo
multiple imputation for missing values as implemented in either SAS/STAT MI
software or Schafer's NORM software. Next, they conducted linear regression analysis
for each complete dataset using either REG or Data Analysis. Finally, they
combined parameter estimates derived from each imputation into a single estimate
for the population of interest using Rubin's algorithms as implemented in either
SAS/STAT MIANALYZE or NORM.
Here we tabulate and compare results reported by CODES Data Network
participants. In addition, we combine the multiple imputation estimates of the effect of
helmet use in a meta-analysis and tabulate the
results. The meta-analysis follows the Bayesian approach presented by
Gelman et al. for normally distributed effects. Finally, we discuss apparent
strengths and weaknesses of the imputation methodology.
SAS and SAS/STAT are trademarks of SAS Institute, Inc.

RESULTS
- Traditional High-Probability Linkage and Complete Case Regression
Table 1 - High Probability Linkage Results
| State |
Crash Records |
Inpatient Records |
Est. Total Links |
Act. Links > 0.9 |
Act. % Est. |
| DE |
81,928 |
10,419 |
4,000 |
1,646 |
41% |
| KY |
307,746 |
39,278 |
4,500 |
2,537 |
56% |
| MD |
224,111 |
626,955 |
8,100 |
3,749 |
46% |
| ME |
99,712 |
10,878 |
1,812 |
1,006 |
56% |
| MN |
254,024 |
39,005 |
5,000 |
2,731 |
55% |
| NE |
83,525 |
24,907 |
1,340 |
940 |
70% |
| NH |
56,689 |
117,394 |
1,200 |
527 |
44% |
| NV |
106,803 |
12,844 |
|
1,034 |
|
| OK |
180,253 |
355,918 |
2,500 |
1,737 |
69% |
| PA |
2,533,317 |
114,979 |
112,500 |
66,515 |
59% |
| SC |
270,463 |
32,530 |
6,500 |
|
|
| SD |
71,080 |
6,409 |
700 |
476 |
68% |
| UT |
428,165 |
77,570 |
7,186 |
4,380 |
61% |
| WI |
357,058 |
621,236 |
5,400 |
3,900 |
72% |
Table 2 - Complete Case Regression Results
| State |
MC Links > 0.9 |
Complete Data |
Charges Intercept |
Helmet Effect |
Std. Error |
| DE |
66 |
44 |
26,705 |
-2,952 |
11,223 |
| KY |
110 |
63 |
26,296 |
-8,464 |
11,184 |
| MD |
229 |
195 |
17,083 |
-236 |
4,859 |
| ME |
52 |
52 |
23,568 |
-8,016 |
7,353 |
| MN |
164 |
135 |
26,204 |
-4,096 |
5,676 |
| NE |
38 |
21 |
6,540 |
18,632 |
20,345 |
| NH |
56 |
19 |
21,323 |
-8,746 |
9,107 |
| NV |
51 |
47 |
51,926 |
-30,878 |
|
| OK |
108 |
98 |
29,733 |
-1,406 |
1,213 |
| PA |
4,014 |
2,664 |
36,629 |
-5,128 |
5,649 |
| SC |
11 |
10 |
6,808 |
-3,300 |
3,919 |
| SD |
65 |
65 |
21,284 |
12,242 |
8,921 |
| UT |
338 |
189 |
17,953 |
-6,394 |
3,277 |
| WI |
446 |
316 |
25,953 |
-7,472 |
4,383 |
...
- Multiple Imputation Results
Table 3 - Complete Linkage Imputation Results
| State |
Crash Records |
Inpatient Records |
Est. Total Links |
Act. Links Imp 1 |
Act. % Est. |
Avg. Prob. Imp 1 |
| DE |
81,928 |
10,419 |
2,000 |
1,828 |
91% |
0.91 |
| KY |
307,746 |
39,278 |
4,500 |
4,978 |
111% |
0.66 |
| MD |
224,111 |
15,419 |
8,200 |
7,670 |
94% |
0.71 |
| ME |
99,712 |
10,878 |
1,812 |
1,654 |
91% |
0.78 |
| MN |
216,990 |
39,005 |
7,100 |
7,478 |
105% |
0.75 |
| NE |
83,527 |
24,907 |
1,340 |
1,278 |
95% |
0.84 |
| NH |
56,689 |
12,877 |
1,200 |
1,296 |
108% |
0.70 |
| OK |
180,253 |
355,918 |
4,163 |
4,027 |
97% |
0.88 |
| PA |
2,533,317 |
114,979 |
95,000 |
87,190 |
92% |
0.73 |
| SC |
270,298 |
32,606 |
4,100 |
3,201 |
78% |
0.96 |
| UT |
428,165 |
77,570 |
7,186 |
7,612 |
106% |
0.68 |
Table 4 - Value Imputation and Regression Results
| State |
MC Cases (Imp. 1) |
Charges Intercept |
Age Effect |
Sex Effect |
Helmet Effect |
Helmet SE |
| DE |
66 |
49,302 |
-484 |
-4,989 |
-8,142 |
9,931 |
| KY |
130 |
16,225 |
172 |
373 |
-5,274 |
8,329 |
| MD |
224 |
9,927 |
79 |
3,544 |
124 |
4,858 |
| ME |
51 |
22,049 |
|
|
-7,307 |
7,187 |
| MN |
207 |
23,505 |
-21 |
-4,394 |
2,089 |
4,293 |
| NE |
35 |
37,536 |
-585 |
-18,589 |
24,532 |
18,786 |
| NH |
66 |
4,977 |
163 |
4,162 |
-2,482 |
5,418 |
| OK |
122 |
9,271 |
-3 |
-2,490 |
173 |
1,600 |
| PA |
2,821 |
22,922 |
-6 |
7,936 |
-762 |
6,844 |
| SC |
256 |
33,031 |
-134 |
-1,767 |
-2,346 |
6,347 |
| UT |
408 |
10,128 |
161 |
2,847 |
-5,153 |
2,583 |
...
- Meta-Analysis Results
Following Gelman, Carlin, Stern, and Rubin (1995), Section 5.4, our meta-analysis constructs:
A simple hierarchical model based on the normal distribution, in which
observed data are normally distributed with a different mean for each [state],
with known observation variance, and a normal population distribution for the
[state] means. This model is sometimes termed the one-way normal
random-effects model with known data variance and is widely applicable, being
an important special case of the hierarchical normal linear model...
For this model, computation of the posterior distribution of [helmet
effects] is most conveniently performed via simulation [using 10,000 random
draws], following the factorization [given for the joint posterior
distribution of model parameters]. The first step, simulating [the
population standard deviation] tau, is easily performed numerically using the
[given] inverse cdf method on a grid of [100] uniformly spaced values of tau,
with [the given posterior distribution of tau]. The second and third
steps, simulating [the population mean] mu and then [the vector of state
helmet effects] theta, can both be done easily by sampling from [given] normal
distributions.
For analytical details, see Data Network Note #12 -
Methodology for Meta-Analysis of Helmet Use Effects
Table 5 - Estimated Quantiles for the Normal Population Distribution of Helmet Use
Effects
| Param |
0.05 |
0.25 |
0.50 |
0.75 |
0.95 |
| Mean |
-4,409 |
-2,613 |
-1,563 |
-515 |
1,014 |
| StdDev |
200 |
969 |
1,904 |
3,198 |
5,574 |
Here, "Population" means all reporting Data Network states.
Figure 1 - Histogram of Simulated Values for the Mean of the Population Helmet Use
Effect
Figure 2 - Histogram of Simulated Values for the Standard Deviation of the Population
Helmet Use Effect
Table 6 - Estimated Quantiles for State Helmet Use Effects
| State |
0.05 |
0.25 |
0.50 |
0.75 |
0.95 |
| DE |
-7,332 |
-3,436 |
-1,789 |
-259 |
2,238 |
| KY |
-6,961 |
-3,351 |
-1,717 |
-242 |
2,403 |
| MD |
-5,293 |
-2,714 |
-1,305 |
112 |
2,883 |
| ME |
-7,484 |
-3,578 |
-1,928 |
-528 |
1,840 |
| MN |
-4,374 |
-2,240 |
-925 |
572 |
3,635 |
| NE |
-5,812 |
-2,755 |
-1,220 |
427 |
4,416 |
| NH |
-6,180 |
-3,209 |
-1,674 |
-284 |
2,195 |
| OK |
-2,885 |
-1,564 |
-643 |
342 |
1,836 |
| PA |
-5,900 |
-2,971 |
-1,440 |
8 |
2,888 |
| SC |
-6,326 |
-3,093 |
-1,566 |
-151 |
2,556 |
| UT |
-6,813 |
-4,258 |
-2,649 |
-1,349 |
268 |
Figure 3 - Histogram of Simulated Values for DE Helmet Use Effect
Figure 4 - Histogram of Simulated Values for KY Helmet Use Effect
Figure 5 - Histogram of Simulated Values for MD Helmet Use Effect
Figure 6 - Histogram of Simulated Values for ME Helmet Use Effect
Figure 7 - Histogram of Simulated Values for MN Helmet Use Effect
Figure 8 - Histogram of Simulated Values for NE Helmet Use Effect
Figure 9 - Histogram of Simulated Values for
NH Helmet Use Effect
Figure 10 - Histogram of Simulated Values for OK Helmet Use Effect
Figure 11 - Histogram of Simulated Values for PA Helmet Use Effect
Figure 12 - Histogram of Simulated Values for SC Helmet Use Effect
Figure 13 - Histogram of Simulated Values for UT Helmet Use Effect
Following Gelman et al., Section 8.5, we checked
the fit of the statistical model estimated by our meta-analysis by simulating
10,000 repetitions of the reported data.
Figure 14 - Histogram of Simulated Values for Minimum Reported Helmet Use
Effect
Figure 15 - Histogram of Simulated Values for Maximum Reported Helmet Use
Effect
Figure 16 - Histogram of Simulated Values for Mean Reported Helmet Use
Effect
DISCUSSION
- For most states, estimates of helmet use effects obtained by analyzing
high-probability complete cases were not the same as estimates obtained by
multiple imputation of missing links and missing values. This suggests
that high-probability complete cases may not be representative of the total
study populations.
Table 7 - Comparison of Estimated State Helmet Use Effects
| State |
Complete Cases |
Complete Effect |
Imputed Cases |
Imputed Effect |
| DE |
44 |
-2,952 |
66 |
-8,142 |
| KY |
63 |
-8,464 |
130 |
-5,274 |
| MD |
195 |
-236 |
224 |
124 |
| ME |
52 |
-8,016 |
51 |
-7,307 |
| MN |
135 |
-4,096 |
207 |
2,089 |
| NE |
21 |
18,632 |
35 |
24,532 |
| NH |
19 |
-8,746 |
66 |
-2,482 |
| OK |
98 |
-1,406 |
122 |
173 |
| PA |
2,664 |
-5,128 |
2,821 |
-762 |
| SC |
10 |
-3,300 |
256 |
-2,346 |
| UT |
189 |
-6,394 |
408 |
-5,513 |
- Meta-analysis results suggest that helmet use is protective, but
the effect is not significant at a 90% confidence level. That is, helmet users incur lower inpatient charges, on
average, but there is wide variation from case to case and state to
state. The 50%-tile estimate for the population-wide helmet use effect
is -$1,563. The estimated symmetric 90% confidence interval for the
population-wide helmet use effect is -$4,409 to $1,104.
The 50%-tile estimate for the state-to-state standard deviation in mean
helmet use effect is $1,904. Consequently, it is likely that the
state-to-state variation in true mean helmet use effect is less than the
apparent variation based on a single report from each state (-$8,142 to
$24,532). The range of 50%-tile estimates for the state effects is
-$2,649 to -$643. No state shows
statistically significant helmet protection with 90% confidence intervals completely below
zero.
Based on 10,000 simulated replications of state reports, the statistical model estimated by our meta-analysis
fits the data fairly well. One test statistic, the mean reported state
helmet use effect, had an actual value near p=0.4 in the
simulated distribution. Two other test statistics, minimum and maximum reported state
helmet use effects, had actual values near p=0.2 and p=0.9 in the
simulated distributions, respectively.
- Most states found significantly more Crash to Inpatient links by using the
imputation methodology and were exceed 90% of their estimated total links. See Data Network Note # 11 - Multiple Imputation and
One-to-One Links (Revised). Linked datasets must be nearly complete (over 90%) for accurate analysis of
study populations.
Table 8 - Comparison of Estimated Versus Actual Link Counts
| State |
Crash Records |
Estimated Total Links |
% of Crash |
Actual Imputed Links |
% of Est. |
| DE |
81,928 |
2,000 |
2.4 |
1,828 |
91 |
| KY |
307,746 |
4,500 |
1.5 |
4,978 |
111 |
| MD |
224,111 |
8,200 |
3.7 |
7,670 |
94 |
| ME |
99,712 |
1,812 |
1.8 |
1,654 |
91 |
| MN |
216,990 |
7,100 |
3.3 |
7,478 |
105 |
| NE |
83,525 |
1,340 |
1.6 |
1,278 |
95 |
| NH |
56,689 |
1,200 |
2.1 |
1,296 |
108 |
| OK |
180,253 |
4,163 |
2.3 |
4,027 |
97 |
| PA |
2,533,317 |
95,000 |
3.8 |
87,190 |
92 |
| SC |
270,298 |
4,100 |
1.5 |
3,201 |
78 |
| UT |
428,165 |
7,186 |
1.7 |
7,612 |
106 |
Eight states reported estimated total links as a percent of crash records
in a fairly narrow range between 1.5% and 2.4%. Maryland, Minnesota, and Pennsylvania were
outliers in the range 3.3% to 3.7%.
- As expected, reported hospital inpatient charges are highly skewed for all
states. Consequently, a logarithmic transformation would be appropriate in
the Phase II regression analysis.
Table 9 - Hospital Inpatient Charges for Motorcycle Riders
| State |
Min. Charges |
Mean Charges |
Max. Charges |
| DE |
775 |
21,029 |
203,472 |
| KY |
43 |
21,728 |
297,296 |
| MD |
794 |
15,599 |
155,344 |
| ME |
1,016 |
20,256 |
111,976 |
| MN |
1,724 |
23,149 |
262,605 |
| NE |
1,733 |
23,917 |
140,209 |
| NH |
532 |
14,748 |
64,089 |
| OK |
527 |
23,454 |
442,378 |
| PA |
16 |
32,198 |
1,150,294 |
| SC |
1,740 |
26,589 |
439,356 |
| SD |
1,489 |
24,214 |
161,989 |
| UT |
432 |
20,429 |
201,680 |
| WI |
807 |
22,237 |
284,821 |
Mean charges for 9 states fall in a fairly narrow range from $20,256 to
$26,589. Kentucky, Maryland, New Hampshire, and Pennsylvania are outliers.
- Missing data values contributed to uncertainty about the true
effect of helmet use on inpatient charges. States reported various
levels of missing helmet use data ranging from 0% to 59%.
Table 10 - Missing Helmet Use
Data in Linked Datasets
| State |
Motorcycle Cases (Imp. 1) |
Helmet Use Missing |
% Missing |
| DE |
66 |
20 |
30% |
| KY |
130 |
14 |
11% |
| MD |
224 |
30 |
13% |
| ME |
51 |
0 |
0% |
| MN |
207 |
22 |
11% |
| NE |
35 |
14 |
40% |
| NH |
66 |
39 |
59% |
| OK |
122 |
12 |
10% |
| PA |
2,821 |
1,037 |
37% |
| SC |
256 |
9 |
4% |
| UT |
408 |
181 |
44% |
Maine had complete helmet use reporting. Only
NH and UT data had over 40% missing helmet use values. Consequently, multiple
imputation and simulation algorithms are likely to be perform well for most
state analyses. Schaffer notes on page 137, that if "rates of
missing information are moderate, say 40% or less, we may expect the
simulations to proceed without much difficulty."
- Most states were able to use CODES 2000 to construct an adequate
linkage probability model for the Phase II analysis. Arizona, Nevada, and
Wisconsin reported difficulties completing
their linkage imputation processes with CODES 2000. Arizona's imputed one-to-one links
included several times as many matched pairs as their estimated total.
Most of the pairs were tabulated with high probabilities and assigned to the
same set, even after installing the latest version of CODES 2000.
In addition, doing linkage imputations in AZ sometimes caused their PC to crash.
South Carolina successfully implemented their linkage imputation as a
one-match process with CODES 2000. They initially tried the same
two-match process used with earlier linkage software: First, link only crash records with names to all
hospital records, and second, link only crash records without names
to unlinked hospital records. However, combining and analyzing these two separate
linkage imputations added substantial complexity.
Figure 21 - Link
Specifications Report for DE
Figure 22 - Link
Specifications Report for KY
Figure 23 - Link
Specifications Report for MD
Figure 24 - Link
Specifications Report for ME
Figure 25 - Link
Specifications Report for MN
Figure 26 - Link
Specifications Report for NE
Figure 27 - Link
Specifications Report for NH
Figure 28 - Link
Specifications Report for OK
Figure 29 - Link
Specifications Report for PA
Figure 30 - Link
Specifications Report for UT
- Maine and Rhode Island reported that using the SAS MI procedure
for value imputation produced errors when there were no missing
values. South Dakota reported that examples of required text file
formats would be useful in the instructions for Schafer's NORM
procedure. No other states reported value imputation issues with either
SAS MI or NORM.
Rhode Island and South Carolina reported errors when using DDE and ODBC
procedures for directly exchanging data between Microsoft Access and
SAS. They had to resort to creating ancillary files in order to transfer
data between these systems.
|