CODES 2000 User Forum -- Data Network Note #10
Effect of Helmet Use on Inpatient Charges - Phase I

Applies to: CODES Data Network.
Last updated: Wednesday February 27, 2002.

SUMMARY
In Phase I of this study, CODES Data Network participants reported the effect
of helmet use on hospital inpatient charges for motorcycle riders injured in
crashes and discharged alive. They estimated the helmet use effect
following two different methodologies in order to compare the results. The
first methodology was a traditional approach. They found a set of high
probability links between police reported crash records and hospital discharge
records using a Fellegi and Sunter probabilistic record linkage model as
implemented in CODES 2000 software from Strategic Matching. Then they
conducted a linear regression analysis on those high probability linked record
pairs that had complete values for all regression variables using either
SAS/STAT REG software from the SAS Institute or Excel Data Analysis software
from Microsoft. The major disadvantage of this first methodology is that
the population analyzed may not represent the population of interest unless the
percentage of missing links and the percentage of missing values are very low,
say less than 5%. All Data Network participants reported rates of missing
links and missing values that were much higher than this level.
In order to better describe the population of interest, the second
methodology was a two-stage Bayesian multiple imputation approach similar to
those described by Rubin, Schafer, and others. Participants created
multiple complete sets of links between crash records and discharge records
using an extended Fellegi and Sunter model and Bayesian multiple imputation of
missing links as implemented in CODES 2000. Then, for each imputed set of
links, they created multiple complete datasets using Markov Chain Monte Carlo
multiple imputation for missing values as implemented in either SAS/STAT MI
software or Schafer's NORM software. Next, they conducted linear regression analysis
for each complete dataset using either REG or Data Analysis. Finally, they
combined parameter estimates derived from each imputation into a single estimate
for the population of interest using Rubin's algorithms as implemented in either
SAS/STAT MIANALYZE or NORM.
Here, we tabulate and compare results reported by CODES Data Network
participants. In addition, we combine the multiple imputation estimates of the effect of
helmet use in a meta-analysis and tabulate the
results. The meta-analysis follows the Bayesian approach presented by
Rubin for normally distributed effects. Finally, we discuss apparent
strengths and weaknesses of the imputation methodology, and suggest changes to
improve the methodology for Phase II of the study.
SAS and SAS/STAT are trademarks of SAS Institute, Inc.

RESULTS
- Traditional High-Probability Linkage and Complete Case Regression
Table 1 - Reported High Probability Linkage Results
| State |
Crash Recs. |
Inpatient Recs. |
Est. Total Links |
Act. Links > 0.9 |
| Demo |
56,689 |
117,394 |
1,200 |
527 |
| DE |
81,928 |
10,419 |
4,000 |
1,646 |
| KY |
307,773 |
39,344 |
2,800 |
1,485 |
| MD |
224,111 |
626,955 |
8,100 |
3,749 |
| ME |
99,712 |
10,878 |
1,812 |
1,006 |
| MN |
254,024 |
39,005 |
5,000 |
2,731 |
| NE |
83,525 |
24,907 |
1,340 |
940 |
| NV |
106,803 |
12,844 |
|
1,034 |
| OK |
180,253 |
355,918 |
2,500 |
1,737 |
| PA |
2,533,317 |
114,979 |
112,500 |
66,515 |
| SC |
270,463 |
32,530 |
6,500 |
|
| SD |
71,080 |
6,409 |
700 |
476 |
| UT |
428,165 |
77,570 |
7,186 |
4,380 |
| WI |
357,058 |
621,236 |
5,400 |
3,900 |
Table 2 - Reported Complete Case Regression Results
| State |
MC Links > 0.9 |
Complete Cases |
Charges Intercept |
Helmet Effect |
Std. Error |
| Demo |
56 |
19 |
21,323 |
-8,746 |
9,107 |
| DE |
66 |
44 |
26,705 |
-2,952 |
11,223 |
| KY |
86 |
53 |
17,163 |
-6,247 |
8,175 |
| MD |
229 |
195 |
17,083 |
-236 |
4,859 |
| ME |
52 |
52 |
23,568 |
-8,016 |
7,353 |
| MN |
164 |
135 |
26,204 |
-4,096 |
5,676 |
| NE |
38 |
21 |
6,540 |
18,632 |
20,345 |
| NV |
51 |
47 |
51,926 |
-30,878 |
|
| OK |
108 |
98 |
29,733 |
-1,406 |
1,213 |
| PA |
4,014 |
2,664 |
36,629 |
-5,128 |
5,649 |
| SC |
11 |
10 |
6,808 |
-3,300 |
3,919 |
| SD |
65 |
65 |
21,284 |
12,242 |
8,921 |
| UT |
338 |
189 |
17,953 |
-6,394 |
3,277 |
| WI |
446 |
316 |
25,953 |
-7,472 |
4,383 |
See results for Demo,
DE, KY,
MD, ME, MN,
NE, OK,
PA, SC,
SD, UT,
and WI in the CODES
Data Network discussion group
- Multiple Imputation Results
Table 3 - Reported Linkage Imputation Results
| State |
Crash Recs. |
Inpatient Recs. |
Est. Total Links |
Total Links Imp 1 |
Avg. Prob. Imp 1 |
| Demo |
56,689 |
117,394 |
1,200 |
1,060 |
0.70 |
| DE |
81,928 |
10,419 |
2,000 |
1,752 |
0.85 |
| KY |
307,773 |
39,344 |
2,800 |
2,470 |
0.70 |
| MD |
224,111 |
15,419 |
8,200 |
6,525 |
0.75 |
| ME |
99,712 |
10,878 |
1,812 |
1,654 |
0.78 |
| MN |
254,024 |
39,005 |
5,000 |
3,045 |
0.62 |
| NE |
83,525 |
24,907 |
1,340 |
1,477 |
0.75 |
| OK |
180,253 |
355,918 |
2,500 |
2,172 |
0.89 |
| PA |
2,533,317 |
114,979 |
112,500 |
67,564 |
0.85 |
| RI |
43,898 |
126,411 |
900 |
710 |
0.52 |
| SC |
270,463 |
32,530 |
6,500 |
3,820 |
0.88 |
| SD |
80,079 |
6,409 |
2,000 |
912 |
0.70 |
| UT |
428,165 |
77,570 |
7,186 |
7,537 |
0.68 |
| WI |
357,058 |
621,236 |
5,400 |
2,635 |
0.72 |
Table 4 - Reported Value Imputation and Regression Results
| State |
Imputed MC Links (Imp. 1) |
Charges Intercept |
Helmet Effect |
Std. Error |
| Demo |
66 |
14,101 |
-1,288 |
4,096 |
| DE |
63 |
24,523 |
-5,754 |
8,986 |
| KY |
101 |
16,437 |
-3,059 |
5,088 |
| MD |
226 |
14,774 |
860 |
4,474 |
| ME |
51 |
22,049 |
-7,307 |
7,187 |
| MN |
108 |
26,482 |
-10,375 |
8,406 |
| NE |
33 |
8,682 |
16,380 |
20,977 |
| OK |
122 |
29,308 |
-1,735 |
1,200 |
| PA |
3,308 |
35,051 |
-2,498 |
16,077 |
| RI |
16 |
18,807 |
9,455 |
17,537 |
| SD |
81 |
23,671 |
6,684 |
10,118 |
| UT |
401 |
17,051 |
-3,616 |
2,864 |
| WI |
378 |
24,104 |
-7,269 |
4,195 |
See results for Demo 1
& 2, DE 1
& 2, DE 1
& 2, KY 1
& 2, MD 1
& 2, ME 1
& 2, MN 1
& 2, NE 1
& 2, OK 1
& 2, PA 1
& 2, SD 1
& 2, UT 1
& 2, and WI
1 & 2 in the CODES
Data Network discussion group
- Meta-Analysis Results
Following Gelman, Carlin, Stern, and Rubin (1995), Section 5.4, our meta-analysis constructs:
A simple hierarchical model based on the normal distribution, in which
observed data are normally distributed with a different mean for each [state],
with known observation variance, and a normal population distribution for the
[state] means. This model is sometimes termed the one-way normal
random-effects model with known data variance and is widely applicable, being
an important special case of the hierarchical normal linear model...
For this model, computation of the posterior distribution of [helmet
effects] is most conveniently performed via simulation [using 10,000 random
draws], following the factorization [given for the joint posterior
distribution of model parameters]. The first step, simulating [the
population standard deviation] tau, is easily performed numerically using the
[given] inverse cdf method on a grid of [100] uniformly spaced values of tau,
with [the given posterior distribution of tau]. The second and third
steps, simulating [the population mean] mu and then [the vector of state
helmet effects] theta, can both be done easily by sampling from [given] normal
distributions.
For analytical details, see Data Network Note #12 -
Methodology for Meta-Analysis of Helmet Use Effects
Table 5 - Estimated Quantiles for the Normal Population Distribution of Helmet Use
Effects
| Param |
0.05 |
0.25 |
0.50 |
0.75 |
0.95 |
| Mean |
-4,735 |
-3,332 |
-2,514 |
-1,725 |
-554 |
| StdDev |
29 |
501 |
1,168 |
2,125 |
4,114 |
Here, "Population" means all reporting Data Network states.
Figure 1 - Histogram of Simulated Values for the Mean of the Population Helmet Use
Effect
Figure 2 - Histogram of Simulated Values for the Standard Deviation of the Population
Helmet Use Effect
Table 6 - Estimated Quantiles for State Helmet Use Effects
| State |
0.05 |
0.25 |
0.50 |
0.75 |
0.95 |
| Demo |
-5,329 |
-3,424 |
-2,369 |
-1,371 |
548 |
| DE |
-6,269 |
-3,666 |
-2,537 |
-1,491 |
413 |
| KY |
-5,921 |
-3,593 |
-2,527 |
-1,498 |
378 |
| MD |
-5,046 |
-3,213 |
-2,205 |
-1,108 |
1,244 |
| ME |
-6,636 |
-3,812 |
-2,637 |
-1,599 |
151 |
| MN |
-6,976 |
-3,886 |
-2,677 |
-1,619 |
275 |
| NE |
-5,927 |
-3,495 |
-2,385 |
-1,302 |
1,082 |
| NV* |
-6,509 |
-3,656 |
-2,503 |
-1,427 |
794 |
| OK |
-3,807 |
-2,837 |
-2,152 |
-1,428 |
-391 |
| PA |
-6,301 |
-3,621 |
-2,463 |
-1,395 |
857 |
| RI |
-5,993 |
-3,532 |
-2,420 |
-1,353 |
1,163 |
| SC* |
-5,822 |
-3,621 |
-2,558 |
-1,559 |
164 |
| SD |
-5,569 |
-3,381 |
-2,301 |
-1,205 |
1,496 |
| UT |
-5,593 |
-3,676 |
-2,638 |
-1,694 |
-270 |
| WI |
-7,111 |
-4,075 |
-2,851 |
-1,867 |
-455 |
*Estimate based on complete case analysis, not imputation
Figure 3 - Histogram of Simulated Values for Demo Helmet Use Effect
Figure 4 - Histogram of Simulated Values for DE Helmet Use Effect
Figure 5 - Histogram of Simulated Values for KY Helmet Use Effect
Figure 6 - Histogram of Simulated Values for MD Helmet Use Effect
Figure 7 - Histogram of Simulated Values for ME Helmet Use Effect
Figure 8 - Histogram of Simulated Values for MN Helmet Use Effect
Figure 9 - Histogram of Simulated Values for NE Helmet Use Effect
Figure 10 - Histogram of Simulated Values for NV Helmet Use Effect
Figure 11 - Histogram of Simulated Values for OK Helmet Use Effect
Figure 12 - Histogram of Simulated Values for PA Helmet Use Effect
Figure 13 - Histogram of Simulated Values for RI Helmet Use Effect
Figure 14 - Histogram of Simulated Values for SC Helmet Use Effect
Figure 15 - Histogram of Simulated Values for SD Helmet Use Effect
Figure 16 - Histogram of Simulated Values for UT Helmet Use Effect
Figure 17 - Histogram of Simulated Values for WI Helmet Use Effect
Following Gelman, Carlin, Stern, and Rubin (1995), Section 8.5, we checked
the fit of the statistical model estimated by our meta-analysis by simulating
10,000 repetitions of the reported data.
Figure 18 - Histogram of Simulated Values for Minimum Reported Helmet Use
Effect
Figure 19 - Histogram of Simulated Values for Maximum Reported Helmet Use
Effect
Figure 20 - Histogram of Simulated Values for Mean Reported Helmet Use
Effect
DISCUSSION
- For most states, estimates of helmet use effects obtained by analyzing
high-probability complete cases were not the same as estimates obtained by
multiple imputation of missing links and missing values. This suggests
that high-probability complete cases are not representative of the total
study populations.
Table 7 - Comparison of Estimated State Helmet Use Effects
| State |
Complete Cases |
Complete Effect |
Imputed Cases |
Imputed Effect |
| Demo |
19 |
-8,746 |
66 |
-1,288 |
| DE |
44 |
-2,952 |
63 |
-5,754 |
| KY |
53 |
-6,247 |
101 |
-3,059 |
| MD |
195 |
-236 |
226 |
860 |
| ME |
52 |
-8,016 |
51 |
-7,307 |
| MN |
135 |
-4,096 |
108 |
-10,375 |
| NE |
21 |
18,632 |
33 |
16,380 |
| OK |
98 |
-1,406 |
122 |
-1,735 |
| PA |
2,664 |
-5,128 |
3,308 |
-2,498 |
| SD |
65 |
12,242 |
81 |
6,684 |
| UT |
189 |
-6,394 |
401 |
-3,616 |
| WI |
316 |
-7,472 |
378 |
-7,269 |
- Meta-analysis results suggest that helmet use is protective at 0.9
significance. That is, helmet users incur lower inpatient charges, on
average, although there is wide variation from case to case and state to
state. The 50%-tile estimate for the population-wide helmet use effect
is -$2,514. The symmetric 90% confidence interval for the
population-wide helmet use effect is -$4,735 to -$554.
The 50%-tile estimate for the state-to-state standard deviation in mean
helmet use effect is $1,168. Consequently, it is likely that the
state-to-state variation in true mean helmet use effect is much less than the
apparent variation based on one specific reporting period (-$10,375 to
$16,380). The range of 50%-tile estimates for the state effects is only
-$2,152 to -$2,851. However, only Oklahoma, Utah, and Wisconsin show
statistically significant helmet protection with 90% confidence intervals completely below
zero.
Based on 10,000 simulated replications of state reports, the statistical model estimated by our meta-analysis fits the data.
For all three test statistics (minimum, maximum, and mean reported state
helmet use effect), actual reported values fall near the p=0.5 values of the
simulated distributions.
- Kentucky reported finding fewer imputed links than
high-probability links in preliminary tests. This was caused by CODES 2000 tabulating the
lowest weight when the same record pair was found in multiple passes.
The problem was corrected when CODES 2000 was changed to tabulate the
highest weight. The new software version was distributed to all Data Network
states.
See Data Network Note # 11 - Multiple Imputation and
One-to-One Links (Revised)
- Most states found significantly more links by using the
imputation methodology. However, only Nebraska and Utah were able to impute all of their
estimated total links. Among all other states, only Maine was able to
impute over 90% of their estimated links, and some states were below 50%.
Linked datasets must be nearly complete (over 90%) for accurate analysis of
study populations.
Table 8 - Comparison of Estimated Versus Actual Link Counts
| State |
Crash Records |
Estimated Total Links |
% of Crash |
Actual Imputed Links |
% of Est. |
| Demo |
56,689 |
1,200 |
2.1 |
1,060 |
88 |
| DE |
81,928 |
2,000 |
2.4 |
1,752 |
88 |
| KY |
307,773 |
2,800 |
0.9 |
2,470 |
88 |
| MD |
224,111 |
8,200 |
3.7 |
6,525 |
80 |
| ME |
99,712 |
1,812 |
1.8 |
1,654 |
91 |
| MN |
254,024 |
5,000 |
2.0 |
3,045 |
61 |
| NE |
83,525 |
1,340 |
1.6 |
1,477 |
110 |
| OK |
180,253 |
2,500 |
1.4 |
2,172 |
87 |
| PA |
2,533,317 |
112,500 |
4.4 |
67,564 |
60 |
| RI |
43,898 |
900 |
2.1 |
710 |
79 |
| SC |
270,463 |
6,500 |
2.4 |
3,820 |
59 |
| SD |
80,079 |
2,000 |
2.5 |
912 |
46 |
| UT |
428,165 |
7,186 |
1.7 |
7,537 |
105 |
| WI |
357,058 |
5,400 |
1.5 |
2,635 |
49 |
Eleven states reported estimated total links as a percent of crash records
in a fairly narrow range between 1.4% and 2.5%. Kentucky, Maryland, and Pennsylvania were
outliers.
One reason for the shortfalls may be incomplete linkage strategies. Not all productive
match passes have been explored. Also, not all shared information has been coded for linkage.
Finding appropriate changes to improve the data preparation and linkage strategies used in Phase I
so that they produce more complete linked datasets is an open issue.
Figure 21 - Link
Specifications Report for DE
Figure 22 - Link
Specifications Report for KY
Figure 23 - Link
Specifications Report for MD
Figure 24 - Link
Specifications Report for ME
Figure 25 - Link
Specifications Report for MN
Figure 26 - Link
Specifications Report for OK
Figure 27 - Link
Specifications Report for PA
Figure 28 - Link
Specifications Report for UT
Another reason for the shortfalls may be a known weakness with the initial CODES 2000 linkage imputation algorithms
presented in November. Arizona, Maryland, and Utah found that
sometimes hundreds or thousands of
matched pairs were assigned to the same set because of very low probability
links. Many of these pairs were dropped by the Imputation Wizard when one-to-one matches were selected
from the sets. Most states did not find this problem. CODES 2000
was changed to avoid the problem by assigning set numbers after linkage
imputation rather than before. The new software version was distributed to those
states reporting high-count sets, but other states may have similar but less
severe problems.
See Data Network Note # 11 - Multiple Imputation and
One-to-One Links (Revised)
- Only Delaware and Maine reported adjusting their linkage
probability models to account for field dependencies or comparison
tolerances. This suggests that the models used in
Phase I by most states could be improved to produce more accurate
probability estimates. It also suggests that the current mechanisms in CODES 2000
for making such adjustments should be simplified or automated to encourage
broader use.
Finding appropriate changes to improve the linkage probability models used in Phase I
so that they produce more accurate estimates is an open issue.
- Utah reported sensitivity to the random number sequences
producing
multiple imputations for the test data. A sensitivity analysis suggested
that 10 linkage imputations and 10 values imputations would produce more stable
results for these data. However, the appropriate number of imputations for each state's data must be
determined individually through a similar sensitivity
analysis.
See Data Network Note # 8 - Variations in Multiple
Imputation Results
- Kentucky reported sensitivity to unusual inpatient charges ($0, outliers).
The appropriate way to handle such charges is an open issue.
See KY
Regression Excluding Charges = $0
As expected, reported hospital inpatient charges are highly skewed for all
states. Consequently, a logarithmic transformation would be appropriate in
the Phase II regression analysis.
Table 9 - Hospital Inpatient Charges for Motorcycle Riders
| State |
Min. Charges |
Mean Charges |
Max. Charges |
| Demo |
532 |
14,748 |
64,089 |
| DE |
775 |
21,029 |
203,472 |
| KY |
43 |
9,125 |
165,391 |
| MD |
794 |
15,599 |
155,344 |
| ME |
1,016 |
20,256 |
111,976 |
| MN |
1,724 |
23,149 |
262,605 |
| NE |
1,733 |
23,917 |
140,209 |
| OK |
527 |
23,454 |
442,378 |
| PA |
16 |
32,198 |
1,150,294 |
| SC |
1,740 |
26,589 |
439,356 |
| SD |
1,489 |
24,214 |
161,989 |
| UT |
432 |
20,429 |
201,680 |
| WI |
807 |
22,237 |
284,821 |
Mean charges for 9 states fall in a fairly narrow range from $20,256 to
$26,589. Kentucky, Maryland, Pennsylvania, and the Demo data are outliers.
- Maryland reported sensitivity to the definition of helmet use. Utah
reported concerns about the definition of helmet use given available
information. Rhode Island reported no missing values for helmet use because
their reporting system defaults to "No." The appropriate way
to define helmet use is an open issue.
See MD Alternate
Helmet Use Definition
See UT Helmet
Use Definition
- Missing data values contributed to uncertainty about the true
effect of helmet use on inpatient charges. States reported various
levels of missing helmet use data ranging from 0% to 55%.
Table 10 - Missing Helmet Use
Data in Linked Datasets
| State |
Imputed MC Links (Imp. 1) |
Helmet Use Missing |
% Missing |
| Demo |
71 |
39 |
55 |
| DE |
62 |
18 |
29 |
| KY |
97 |
13 |
13 |
| MD |
226 |
30 |
13 |
| ME |
51 |
0 |
0 |
| MN |
108 |
22 |
20 |
| NE |
33 |
14 |
42 |
| OK |
122 |
12 |
10 |
| PA |
941 |
350 |
37 |
| SC |
266 |
9 |
3 |
| SD |
81 |
1 |
1 |
| UT |
387 |
181 |
47 |
| WI |
338 |
117 |
31 |
Maine, South Carolina, and South Dakota had nearly complete helmet use reporting. Only
the Demo data had over 50% missing helmet use values. Consequently, multiple
imputation and simulation algorithms are likely to be perform well for most
state analyses. Schaffer notes on page 137, that if "rates of
missing information are moderate, say 40% or less, we may expect the
simulations to proceed without much difficulty."
- Most states were able to use CODES 2000 to construct an adequate
linkage probability model for the Phase I analysis. Arizona and South Carolina reported difficulties completing
their linkage imputation processes. Arizona's imputed one-to-one links
included several times as many matched pairs as their estimated total.
Most of the pairs were tabulated with high probabilities and assigned to the
same set, even after installing the software fixes mentioned earlier.
In addition, doing linkage imputations sometimes caused the PC to crash.
South Carolina reported doing linkage imputation as a two-step
process. First, they linked only crash records with names to all
hospital records. Second, they linked only crash records without names
to unlinked hospital records. Combining and analyzing these separate
linkage imputations added more complexity to the process.
- Maine and Rhode Island reported that using the SAS MI procedure
for value imputation produced errors when there were no missing
values. South Dakota reported that examples of required text file
formats would be useful in the instructions for Schafer's NORM
procedure. No other states reported value imputation issues with either
SAS MI or NORM.
Rhode Island and South Carolina reported errors when using DDE and ODBC
procedures for directly exchanging data between Microsoft Access and
SAS. They had to resort to creating ancillary files in order to transfer
data between these systems.
RECOMMENDATIONS
- Upgrade to CODES 2000 Version 2.2.350 from a new distribution CD.
This will provide all states with the
latest linkage imputation algorithms and other enhancements that have been
developed to address reported problems.
- Revise link join specifications to obtain at least 90% coverage of estimated
total links. This should reduce sensitivity to any remaining missing
links.
- Revise tables and match specifications to add new match fields so that all
available information is used for matching. For example: injury date, crash
type, vehicle type, driver flag, injured flag, fatality flag, etc.
This should improve the accuracy of linkage probability models as well as increase
the number of links found.
- Revise match specifications to reflect important field reliabilities,
field dependencies, and comparison tolerances.
This should improve the accuracy of linkage probability models.
- Revise imputation methodology to do 10 linkage imputations and 10 missing
value imputations. This should reduce sensitivity to the random number
sequences used for imputation. Conduct sensitivity analysis of
random number in selected states.
- Treat $0 charges as missing values. Otherwise, accept all reported
charges. For high outlier charges, identify specific procedures that
contribute most to the total charges. Conduct sensitivity analysis of
outlier charges in selected states.
- Revise regression models to use rider age, rider sex, and logarithm of
inpatient charges. The regression model used for Phase I was
intentionally simplistic. In addition, transformed charges should have
closer to normal distributions.
- Revise imputation models to use rider sex. Imputation models should
be at least as complex as regression models.
- Revise the meta-analysis to combine regressions on logarithm of
inpatient charges. Transformed charges should have closer to normal
distributions.
- Conduct sensitivity analyses of helmet use definitions in selected states.
We cannot correct poor helmet use information, but at least we can describe
how it affects analysis results.
- Revise any non-standard approach to linkage or imputation that is not
consistent with the recommended approaches.
|