Use flags as match fields. In general, calculations for the linkage models assume that any value reported for a match field in Table A could have been reported for the corresponding match field in Table B, albeit with a different probability. Clearly, this is not true for fields used as flags to identify certain types of cases, such as Crash=Y on all crash records versus Crash=Y or N on EMS records. Flags can be used for case selection, matching, or both. For example, you could select for linkage only EMS records or Hospital records with Injury=Y. Using flags for case selection will decrease false positives while increasing false negatives compared to using the same flags for matching because no field is perfect. You should test both approaches to determine which works best with your data. Linkage algorithms detect when a flag used for linkage in order to handle such fields as special cases so that calculated match probabilities are correct. The following examples show how this works. For simplicity, we assume the flags have no missing values or incorrect values.
Example 1.
1,000 Crash records for vehicle occupants
1,000 EMS records for injured patients including 100 records for patients injured in one of the crashes
Assume no missing values or incorrect values so that Crash=Y on all crash records. Crash=Y on 100 EMS records for crash victims, otherwise Crash=N (Results are similar for flags like Injured=Y or N on Crash records versus Injured=Y on EMS records).
If we use the flag for matching:
Matched Pairs = 100
Unmatched Pairs = (1000 X 1000) – 100 = 999900
Prior Odds = 100 / 999900 = 1 / 9999 = 0.00010001
Every record pair that is a true link has Crash=Y on both records, so m probability for agreement = 1.0.
We can calculate the total number of record pairs with Crash=Y on both records:
1000 Crash records X 100 EMS records = 100000 pairs. Of these, 100 are true Matched pairs so the balance of 99900 must be true Unmatched pairs.
So, u probability for agreement = 99900 / 999900 = 0.09990999 and m/u for agreement = 1.0 / 0.09990999 = 10.009009 (This is the likelihood ratio. The corresponding match weight = log base 2 (10.00901) = 3.32).
Posterior odds given agreement = Prior Odds X Likelihood Ratio = 0. 00010001 X 10.009009 = 0.001001001
If we use the flag for case selection (select only EMS records with Crash=Y for linkage):
Matched Pairs = 100
Unmatched Pairs = (1000 X 100) – 100 = 99900
Prior Odds = 100 / 99900 = 1 / 999 = 0.001001001, same as posterior odds when used for matching.
Example 2.
Suppose there are 500 true links:
1,000 Crash records for vehicle occupants
1,000 EMS records for injured patients including 500 records for patients injured in one of the crashes
Assume no missing values or incorrect values so that Crash=Y on all crash records. Crash=Y on 500 EMS records for crash victims, otherwise Crash=N.
If we use the flag for matching:
Matched Pairs = 500
Unmatched Pairs = (1000 X 1000) – 500 = 999500
Prior Odds = 500 / 999500 = 1 / 1999 = 0.00050025
Every record pair that is a true link has Crash=Y on both records, so m probability for agreement = 1.0.
We can calculate the total number of record pairs with Crash=Y on both records:
1000 Crash records X 500 EMS records = 500000 pairs. Of these, 500 are true Matched pairs so the balance of 499500 must be true Unmatched pairs.
So, u probability for agreement = 499500 / 999900 = 0.49955
and m/u for agreement = 1.0 / 0.49955 = 2.001802 (The corresponding match weight = log base 2 (2.001802) = 1.001).
Posterior odds given agreement = Prior Odds X Likelihood Ratio = 0. 00050025 X 2.001802 = 0.001001001
If we use the flag for case selection:
Matched Pairs = 500
Unmatched Pairs = (1000 X 500) – 500 = 499500
Prior Odds = 500 / 499500 = 1 / 999 = 0.001001001, same as posterior odds if used for matching.