Simulation -- Guidelines

Data Simulation. Simulated data are artificial data created by an automatic program. The most important aspect of using simulated data for record linkage training purposes is that you can tell by inspection which pairs of records are true matches. Records for the same person in different tables are given the same record number. Simulated data are designed with statistical characteristics similar in many important ways to characteristics of real data. Consequently, you can validate your real linkage process if you learn the statistical characteristics of your real data, create and link realistic simulated data, and demonstrate that your linkage model provides adequate goodness of fit.