CODES 2000 User Forum -- Data Network Note #11
Multiple Imputation and One-to-One Links (Revised)

Applies to: CODES 2000 Version 2.2.338.
Last updated: Tuesday February 19, 2002.

SUMMARY
For Phase I of the helmet use study, the first step for multiple imputation of linked record pairs
was to run several match passes in CODES 2000 at a very low cutoff probability, say
0.001, and then merge the results. This produced a Match
Pairs In Sets table that approximated the
posterior distribution of match probabilities for all record pairs given
observed agreements, disagreements, and missing values. A supplementary Linkage Imputation Wizard was
introduced to impute multiple complete
sets of linked pairs by simulating random draws from this posterior distribution. The
Wizard also reduced many-to-many matches to one-to-one matches so that one Crash record
was linked to at most one Inpatient record, and vice versa.
Some Data Network states discovered that this methodology produced many fewer
imputed links than expected. In a few cases, the imputation process
produced even fewer links than the original high-probability match. The
reasons behind these unexpected results have been identified, and new versions
of CODES 2000 have been developed that address the problems. This Note
describes the current methodology for linkage imputation.

ANALYSIS
- CODES 2000 should tabulate maximum weights.
CODES 2000 often finds the same matched pair of records in more than one
match pass. If match specifications for the passes are not identical,
then a common matched pair can be assigned different weights in different
passes. This might happen if one pass matches county location while
another pass matches town location. Or, if one pass matches crash date
to admit date while another pass matches the day after the crash date.
When record A is linked to record B in two different passes with two
different weights, the pre-imputation version of CODES 2000 tabulated the
lower weight when the passes were merged. When all match weights were for
probabilities above 0.9, this did not make much difference. For imputation, we
now accept much lower probabilities. So, if A to B has probability 0.9 in pass
1 and probability 0.5 in pass 2 then the pre-imputation version of CODES 2000
tabulated the weight for 0.5 when you impute but tabulated the weight for 0.9
when you do the traditional linkage. This resulted in fewer imputed
links when drawing from the posterior distribution. This inconsistency was not
good, so CODES 2000 was changed to tabulate the highest weight in version 2.1.317. This new version was distributed to all Data Network
states on December 3, 2001.
Utah and Kentucky tested the new version of CODES 2000. Numbers from Mike
Singleton illustrate the possible effects. Before the change, he performed the
old-style match with a 0.9 cutoff, and got 1,500 matches with a weight of 25
or greater. Then he matched with the same specs at 0.01, and performed the
multiple imputation. He estimated there should be a total of 3,000 matches.
The imputation routine found about 2,500 matches. He tabulated the match
weights for the LinkedPairs1 table, and found only 644 matches with a weight
of 25 or greater. Also, the median match probability after imputation was only
0.35.
After the change, the average number of matches found by imputation was
2,829. The increase from 1,500 to 2,829 was 1,329, which is very close to the
number of hospital cases E-coded as motor vehicle injuries that failed to link
at 0.9. The median match probability after imputation was around 0.85, and
most of the high-probability matches survived.
Note that a few high probability matches may be dropped when you do any of
the imputations -- a 0.9 probability match has a 0.9 probability of being
selected and a 0.1 probability of being dropped in each imputation. So,
a 0.9 probability match will be selected for most imputations while a 0.1
probability match will be selected only rarely.
- CODES 2000 should assign set numbers after imputation.
The pre-imputation version of CODES 2000 assigned each matched pair to a set of pairs as part of the
merge process. For example, if Crash record A was linked to both
Inpatient record B and Inpatient record C, then matched pairs A-B and A-C were
both assigned to the same set. The purpose of assigning set numbers was
to allow identification and review of many-to-many matches in which one Crash record
was linked to more than one Inpatient record or more than one Crash record was
linked to the same Inpatient record. Set numbers were designed to
allow selection of one-to-one matches from many-to-many matches because any many-to-many match must be
reduced to one or more one-to-one matches before you can analyze the linked
record pairs.
The process of creating a one-to-one match from a many-to-many match
consists of selecting one or more matched pairs from each set. For
example, suppose a set contains three matched pairs A-C, B-C, and B-D.
First, we might select A-C. Second, we eliminate B-C because C is already
linked. Third, we select the remaining pair, B-D. The Linkage
Imputation Wizard incorporated an existing algorithm designed for use with
high cutoff probability matches that looked for up to 10 unique one-to-one
pairs in each set. For some states, we discovered that when merging
match passes with very low cutoff probability matches, some sets contained
hundreds, or even thousands, of unique one-to-one matches. This is
because adding thousands of very low probability links to the Match Pairs In
Sets table can result in very long chains of linked pairs because some records
might link to many other records at very low probability. For such
large sets, many of the unique matched pairs were lost when creating the
one-to-one matches.
CODES 2000 has been changed in version 2.2.336 to assign set numbers
separately for each imputation rather once for all pairs in the Match Pairs In
Sets table. Each imputation contains only a few very low probability
links. Consequently, the potential for very large sets is substantially
reduced. In addition, the Linkage Imputation Wizard has been changed to
look for up to 50 unique one-to-one pairs. In version 2.2.336, the
Imputation Wizard has been incorporated into the standard CODES 2000 Perform
Match Wizard for ease of use. It appears when you click on the Merge button.
Version 2.2.336 will be distributed to Data Network states prior to starting
Phase II of the helmet study.
Maryland and others tested the new version of CODES 2000. Numbers from Shiu
Ho illustrate the possible effects. Before the change, she performed the
old-style match with a 0.9 probability cutoff, and found 3,749 matches. Then
she matched with the same specs at a 0.001 probability cutoff, and performed
the multiple imputation. She estimated there should be a total of 8,200
matches. The imputation routine found only about 4,350 matches (53%).
She tried several minor variations in the match specifications but could do
no better than 4,900 matches. In fact, some rejected trials with very
loose join specifications produced as few as 1,300 imputed links. After
some investigation, we found that 3,742 matched pairs had been assigned to set
number 229, and that most of these pairs were dropped when the Linkage
Imputation Wizard created one-to-one matches. The new Imputation Wizard
was designed to correct this problem. After installing the new Wizard,
over 6,500 matches (about 80%) were found by imputation using the same match
specifications as used earlier.
Because of the potential impact on imputation results, all Data Network
states were notified about this problem on January 17, 2002, and asked to
count the number matched pairs in each set using a specified SQL command.
- New CODES 2000 Linkage Imputation Wizard.
When you click on the usual Merge button, you see the usual confirmation
message:

When the Merge is complete, you see the usual information message:

After you acknowledge the message, you see the new Linkage Imputation
Wizard:

Enter the number of imputations that you want to the Wizard to
create. Imputations will be tabulated and set numbers assigned in tables
named ImputedPairsInSets1, ImputedPairsInSets2, etc. If you enter 0, the Wizard will not create
any imputation tables. The Imputation Wizard will tabulate imputed one-to-one
matches in tables named LinkedPairs1, LinkedPairs2, etc. If you choose
not to impute links, the Wizard will tabulate one-to-one links from the entire
MatchPairsInSets table in a table
named LinkedPairs0. In this case, the Wizard picks the highest weight
pairs from each set.