CODES 2000 User Forum -- Data Network Note #13
Finding Most of Your Estimated Total Links

Applies to: CODES Data Network.
Last updated: Monday April 01, 2002.

SUMMARY
In Phase II of the Helmet Use Study, we want to ignore any missing links
between Crash and Inpatient records. Proving that missing links
are truly ignorable is a challenging statistical task that may not be feasible
for most states. Instead, we will make the assumption that if each state can
find at least 90% of their estimated total possible links then any remaining missing links
are unlikely to bias study results in a significant way.
In Phase I of the study, a few states met the 90% minimum coverage
criteria. However, most states did not. The first task in Phase II
for states below 90% is to revise their data and linkage specifications in order
to find at least 90% of estimated total possible links. Achieving at least
90% coverage is mandatory in order for a state's results to be included in the
final analysis. Here, we present a methodology for finding most of your
estimated total links.

METHODOLOGY
- Read this Tech Note and make a plan for your Phase II linkage.
Based on suggestions here and your Phase I experience, decide whether you
expect to create new Crash or Hospital tables with new or modified fields.
Based on suggestions here and your Phase I experience, decide whether you
expect to add new match passes, modify existing match passes, or just repeat
the linkage imputation process using the new algorithms.
- Install CODES 2000 Release 2.2.373
In a few states, the original linkage and imputation methodology resulted
in thousands of matched record pairs being assigned to the same set.
Consequently, most of these matches were not included in the final one-to-one
linkage, substantially reducing the apparent coverage. Other states may
have experienced a less severe version of this situation that limited their
coverage. The new software release provides a better methodology that
should avoid the problem by assigning record pairs to sets after imputation,
rather than before.
See Tech Note # 14 -- Installation Notes for CODES 2000
Release 2.2.373
See Tech Note # 16 -- Imputing Complete,
One-to-One Dual Linkages
Even states who achieved over 90% coverage should install the new software
release so that all states will be using the same linkage imputation
methodology.
- Create a new CODES 2000 project and import your old project into it.
See Tech Note # 15 -- Importing an Existing Project
into a New Project
You can import specifications only, specifications
plus Crash or Hospital data, or specifications plus data plus Crash to Hospital match results.
Based on your plan from Step 1, you may want to minimize unnecessary
repetition of Phase I work. For example, if you are not going to modify
your Crash or Hospital tables, then you can just import them from your old
project.
- Make sure that your estimated number of total possible links is realistic
and supportable.
Review Table 8 in the Phase I report to see where your state's estimate
fell relative to those of other network states. Based on your Phase I
experience, decide whether your estimated number of total possible links is
realistic and supportable. Revise the figure if necessary.
See Data Network Note # 10 -- Effect of Helmet Use on
Inpatient Charges - Phase I
- Make sure that you are using all available information for linkage
purposes.
You may have found all possible links but their probabilities may be
inaccurately low. 1,000 matched pairs at 0.1 probability contribute only
about 100 pairs to each imputed match, while the same 1,000 matched pairs at
0.4 probability contribute about 400 pairs. Increasing a typical match
weight by only 2 points increases the odds for a true match by a factor of 4.
Calculated match probabilities may be unrealistically low if you omit available
information from your match specifications. This is particularly true if
you only have age and sex as personal identifiers for many crash occupants.
You can raise the probabilities of true matched pairs by revising your
import, standardization, and match specifications to use all available
information for matching. For example, consider injury date, crash type,
vehicle type, driver flag, injured flag, and fatality flag.
- Make sure that your field error probabilities accurately describe low
reliability fields.
A field with 0.01 error probability in both tables has a disagree weight
near -5.6. A field with 0.05 error probability in both tables has a
disagree weight near -3.3. This 2 point reduction in the disagree weight
increases the odds for a true match by a factor of 4.
You can raise the probabilities of true matched pairs by increasing error
probabilities for low reliability fields to more realistic levels. This
is especially true if you are still using the default 0.01 error
probabilities. Revised error probabilities should be based on analysis
of your Phase I linkage results. Pick a match pass in which the field of
interest was not a join field. Examine the Match Pairs Pass X table and
determine the fraction of high-probability matches in which the field
disagrees. Use half of this value as your estimated error probability
for the field in each table.
- Make sure that you take your missing data patterns into account in your
join specifications.
Your join specifications may have skipped some reasonable candidate record
pairs because of missing data. Add one or more new match passes to cover any gaps
skipped by existing
passes.
Consider removing or replacing a join field if the field has a high
occurrence of missing values. Note that removing a join field
might increase the number of candidate pairs to an unacceptable level, so
always count the candidate pairs after making such a change.
- Make sure that your linkage probability model is accurate.
Confirm that you have selected your match fields or adjusted your match weights to account for
significant field dependencies.
Confirm that you have adjusted your match weights to account for
comparisons with tolerances.
- Test whether 10 linkage imputations are needed to capture random
variations in your imputation results.
Create 10 linkage imputations. Compute average inpatient charges by
helmet use for the first 5 imputations, the second 5 imputations, and all 10
imputations. If all three sets of imputations give essentially the same
result, then 5 imputations are sufficient. Otherwise, use 10
imputations.