CODES 2000 User Forum -- Data Network Note #13

Finding Most of Your Estimated Total Links

Applies to: CODES Data Network.
Last updated: Monday April 01, 2002.

SUMMARY

In Phase II of the Helmet Use Study, we want to ignore any missing links between Crash and Inpatient records.  Proving that missing links are truly ignorable is a challenging statistical task that may not be feasible for most states.  Instead, we will make the assumption that if each state can find at least 90% of their estimated total possible links then any remaining missing links are unlikely to bias study results in a significant way.

In Phase I of the study, a few states met the 90% minimum coverage criteria.  However, most states did not.  The first task in Phase II for states below 90% is to revise their data and linkage specifications in order to find at least 90% of estimated total possible links.  Achieving at least 90% coverage is mandatory in order for a state's results to be included in the final analysis.  Here, we present a methodology for finding most of your estimated total links.

METHODOLOGY

  1. Read this Tech Note and make a plan for your Phase II linkage.

Based on suggestions here and your Phase I experience, decide whether you expect to create new Crash or Hospital tables with new or modified fields.

Based on suggestions here and your Phase I experience, decide whether you expect to add new match passes, modify existing match passes, or just repeat the linkage imputation process using the new algorithms.

  1. Install CODES 2000 Release 2.2.373

In a few states, the original linkage and imputation methodology resulted in thousands of matched record pairs being assigned to the same set.  Consequently, most of these matches were not included in the final one-to-one linkage, substantially reducing the apparent coverage.  Other states may have experienced a less severe version of this situation that limited their coverage.  The new software release provides a better methodology that should avoid the problem by assigning record pairs to sets after imputation, rather than before.

See Tech Note # 14 -- Installation Notes for CODES 2000 Release 2.2.373

See Tech Note # 16 --  Imputing Complete, One-to-One Dual Linkages

Even states who achieved over 90% coverage should install the new software release so that all states will be using the same linkage imputation methodology.

  1. Create a new CODES 2000 project and import your old project into it.

See Tech Note # 15 -- Importing an Existing Project into a New Project

You can import specifications only, specifications plus Crash or Hospital data, or specifications plus data plus Crash to Hospital match results.  Based on your plan from Step 1, you may want to minimize unnecessary repetition of Phase I work.  For example, if you are not going to modify your Crash or Hospital tables, then you can just import them from your old project. 

  1. Make sure that your estimated number of total possible links is realistic and supportable.

Review Table 8 in the Phase I report to see where your state's estimate fell relative to those of other network states.  Based on your Phase I experience, decide whether your estimated number of total possible links is realistic and supportable.  Revise the figure if necessary.

See Data Network Note # 10 -- Effect of Helmet Use on Inpatient Charges - Phase I

  1. Make sure that you are using all available information for linkage purposes.

You may have found all possible links but their probabilities may be inaccurately low.  1,000 matched pairs at 0.1 probability contribute only about 100 pairs to each imputed match, while the same 1,000 matched pairs at 0.4 probability contribute about 400 pairs.  Increasing a typical match weight by only 2 points increases the odds for a true match by a factor of 4.

Calculated match probabilities may be unrealistically low if you omit available information from your match specifications.  This is particularly true if you only have age and sex as personal identifiers for many crash occupants.

You can raise the probabilities of true matched pairs by revising your import, standardization, and match specifications to use all available information for matching.  For example, consider injury date, crash type, vehicle type, driver flag, injured flag, and fatality flag.

  1. Make sure that your field error probabilities accurately describe low reliability fields.

A field with 0.01 error probability in both tables has a disagree weight near -5.6.  A field with 0.05 error probability in both tables has a disagree weight near -3.3.  This 2 point reduction in the disagree weight increases the odds for a true match by a factor of 4.

You can raise the probabilities of true matched pairs by increasing error probabilities for low reliability fields to more realistic levels.  This is especially true if you are still using the default 0.01 error probabilities.  Revised error probabilities should be based on analysis of your Phase I linkage results.  Pick a match pass in which the field of interest was not a join field.  Examine the Match Pairs Pass X table and determine the fraction of high-probability matches in which the field disagrees.  Use half of this value as your estimated error probability for the field in each table.

  1. Make sure that you take your missing data patterns into account in your join specifications.

Your join specifications may have skipped some reasonable candidate record pairs because of missing data.  Add one or more new match passes to cover any gaps skipped by existing passes.

Consider removing or replacing a join field if the field has a high occurrence of missing values.  Note that removing a join field might increase the number of candidate pairs to an unacceptable level, so always count the candidate pairs after making such a change.

  1. Make sure that your linkage probability model is accurate.

Confirm that you have selected your match fields or adjusted your match weights to account for significant field dependencies.

Confirm that you have adjusted your match weights to account for comparisons with tolerances.

  1. Test whether 10 linkage imputations are needed to capture random variations in your imputation results.

Create 10 linkage imputations.  Compute average inpatient charges by helmet use for the first 5 imputations, the second 5 imputations, and all 10 imputations.  If all three sets of imputations give essentially the same result, then 5 imputations are sufficient.  Otherwise, use 10 imputations.

 

 
© Copyright 2000 - 2008 Strategic Matching, Inc. All rights reserved. Microsoft, Windows, and Access are trademarks of Microsoft Corporation. Last modified: Monday January 28, 2008.