Overview New Version 8 -- Guidelines

What's New in Version 8. There are important changes in Version 8 that should improve your future record linkage results. Internet-based training sessions were conducted in late 2006 and early 2007 using version 7.0 and artificial data tables for which true linkage results were known. Most participants eventually developed effective linkage models with excellent goodness of fit (Chi Square p value > 0.1), but not without a struggle. Many of the changes in Version 8 were made to address issues raised during and after this training. First, the user interface is simpler and several ancillary features such as validation tests are now built-in instead of requiring stand-alone work outside of the linkage application. Second, the statistical linkage model has been improved in several ways. Probabilities that data values are correct but different are now handled as match parameters, not data source parameters. Comparisons with tolerances are handled accurately in probability calculations without requiring ad hoc adjustments via X factors. Agree and disagree X factors that correct for dependent comparison outcomes are re-estimated as part of each Markov Chain Monte Carlo iteration rather than being fixed at prior estimates. Finally, the AutoPilot has been strengthened as a demonstration and training tool. Simulated data tables are prepared at the beginning of each AutoPilot run so that you can see the process from beginning to end. You should prepare your own simulated data and test linkage strategies before linking your real data.

CODES2000 and LinkSolv now use only four Access databases: CODES2000 and LinkSolv contain all of the Access forms comprising the user interface. STMTData contains all data tables used for managing linkage projects. UserLibrary contains all user developed custom Visual Basic functions. STMTLibrary contains Access Visual Basic objects that are part of an improved object oriented design:

  1. Objects for User Interface: Auto Pilot, Frequency Table, Help, License Wizard, Manage Project, Perform Match, Prepare Data, Review Match, Specify Match, Splash, Test Compare Method, Test Standard Method, and Welcome.

  2. Objects for Data Simulation: Crash, EMS, Hospital, Person, Simulate Data, and Vehicle.

  3. Objects for Record Linkage: Match, Option List, Project, Source, and Triple Match.

  4. Objects for Application Management: Application, Database, Environment, and Utilities.

Some linkage practitioners reported problems working with earlier projects because they had obsolete user interfaces or obsolete project – in the past, each project was created with its own copy of all user interface forms. Version 8 uses a single Access database as the complete user interface rather than separate Access databases for each linkage project. This means that the user interface for all projects will be exactly the same regardless of when they were built. All project management, database management, and data manipulation are done in the Strategic Matching library. Before, these tasks were split between procedures attached to forms and procedures in the library. No forms or tables remain in the library – just procedures. All project management tables, reference tables, and demo tables are in a single database for tables, not distributed across different databases. Entries for different projects, sources, and matches are distinguished by ProjectID, SourceID, and MatchID fields.

Some linkage practitioners reported problems with Access links when archiving, restoring, or moving projects. This is because the locations of linked tables are hard-coded into Access databases and any change in location breaks the links. The new architecture in Version 8 uses many fewer links, tabulates all link information when a link is created, and checks the information and rebuilds the links each time CODES2000 is opened. 

Some linkage practitioners reported too many records to be handled easily in Access tables, especially when working with emergency department records or multiple years of data. Version 8 allows optional use of tables in SQL Server databases. The user interface remains Access forms and all project management tables are in Access. Use of SQL Server databases is handled through a System DSN named CODES2000 or LinkSolv created in advance by a system administrator through the Data Sources ODBC Administrator dialog. Design principles were set up for handling data types, SQL dialects, and custom functions in order to accommodate SQL Server databases. All database activity now takes place through a single connection, either to an Access JET database or a SQL Server database. All data definition and data manipulation tasks such as CreateTable, CreateView, or ExecuteSQL are managed through a small number of procedures which detect and adjust for different provider characteristics.

CODES2000 and LinkSolv references to Access data types not available in SQL Server were replaced with compatible data types. For example, Access data type LONG was replaced Access data type INTEGER, which is compatible with SQL Server data type INTEGER.

CODES2000 and LinkSolv table designs were changed to be compatible with SQL Server table designs by specifying a PRIMARY KEY. A SQL Server table cannot be updated unless it has a primary key.

CODES2000 and LinkSolv query designs were changed to be compatible with SQL Server query designs. In some cases, this required new Visual Basic procedures for Access. In other cases, it required new Transact SQL functions for SQL Server. Where possible, query designs were changed to use native (built-in) functions for better performance.