Options -- Guidelines

Average Agree Weights. Match weights are logarithms of likelihood ratios. Match probabilities are calculated from likelihood ratios which take into account agreements and disagreements on all data fields which are compared in your linkage model. Match probabilities are used to draw multiple imputations of true link status for all candidate record pairs from a Bayesian posterior distribution. In general, an agreement on rare values (say, Age = 99) receives greater weight than an agreement on common values (say, Age = 30). Disagree weights are the same for all data values because the linkage model assumes that the probability of incorrect values is independent of the true data value. Different agree weights can cause biases in linked datasets used for analysis if you are using Age (or any other field in your linkage model) as an analysis variable. People with rare ages are less likely to have false negative matches (missing links) than people with common ages. You can remove this bias by specifying Y for Average Agree Weights.

Data Provider. Each new project is created for either the Access JET Data Provider or the SQL Server Data Provider, depending on which you specify on the Options Tab. If you specify Access JET as Data Provider, all data tables for the project are built in Access databases (*.mdb) in a local folder with the same name as the project. Multiple databases are created because of size limitations in Access. The default location for the project folder is the same as the default location for new databases (Access Options > General). If you specify SQL Server as Data Provider, all data tables for the project are built on the server in a single database. The server and database must be identified as an ODBC data source (Control Panel > Administrative Tools) in a System DSN named SQL_STMT (SQL Server for Strategic Matching). Specify Windows NT Authentication for the DSN. You may want to change databases for different projects so that data tables for a new project do not overwrite data tables for an old project. An Access Data Project (*.adp) is created in the default project folder and connected to the specified SQL Server database. You will have to provide information requested by the New Connection dialog.

Random Sample Percent. Random samples of record pairs from (A x B) are used to estimate the effects of comparisons with tolerances and the effects of comparisons with dependent match fields. Larger sample sizes give more accurate estimates of population statistics. Smaller sample sizes give faster run times. Sample sizes of a few million pairs can be analyzed in a few minutes. You can calculate the size of (A x B) by multiplying the number of records in table A times the number of records in table B.