CODES 2000 User Forum -- Data Network Note #1

Sources for Multiple Imputation Software

Applies to: Tutorial Nov 6-7, 2001.
Last updated: Friday October 26, 2001.

SUMMARY

Several sources are available for multiple imputation software.  In general, different imputation algorithms are needed for different commonly used data models:  multivariate normal, categorical, or mixed.  Not all software programs support all models and not all programs provide the same analysis features for the models they do support.  The multivariate normal model is supported by all software listed here.

SOFTWARE SOURCES

  1. SAS/STAT MI Procedure

    Available from SAS Institute, Inc as an experimental procedure.

    SAS and SAS/STAT are trademarks of SAS Institute, Inc.

The following description is from the SAS online documentation for the MI Procedure at www.sas.com//service/library/onlinedoc/v82/whatsnew:

Multiple imputation is a strategy for dealing with data sets with missing values. You replace each missing value with a set of plausible values that represent the uncertainty about the right value to impute. You create multiply imputed data sets, analyze them with standard analyses, and then combine the results. You produce valid statistical inferences that properly reflect the uncertainty due to the missing values.

The MI procedure creates multiple imputed data sets for incomplete p-dimensional multivariate data. It offers three methods for creating the imputed data sets: the regression method, the propensity score method, and the Markov Chain Monte Carlo (MCMC) method. The procedure creates an output data set containing m imputed versions of the original data. In each version, the missing values are replaced with imputed values. For the MCMC method, you can specify whether you want a single chain for all m imputations or a separate chain for each imputation. You can also specify the initial estimates for the MCMC method. After analyzing your imputed data with standard procedures, you use the MIANALYZE procedure to combine the results.

The MI procedure was introduced in Release 8.1 and remains experimental in Release 8.2, with various new options and output displays available. Among others, a new TRANSFORM statement enables you to transform variables before imputation and back-transform these variables before combining inferences and creating output data sets.

 

  1. Schafer's Stand-alone NORM Program

Available for free from Schafer's web site www.stat.psu.edu/~jls/.

The following description is from the NORM help manual:

NORM is a Windows 95/98/NT program for multiple imputation (MI) of incomplete multivariate data. Its name refers to the multivariate normal distribution, the model used to generate the imputations. The main procedures in NORM are:

    · an EM algorithm for efficient estimation of mean, variances, and covariances (or correlations); and
    · a data augmentation procedure for generating multiple imputations of missing values.

Additional features include:

    · pre- and post-imputation processing of data (transformations, rounding, etc.), which can be helpful for imputing certain kinds of non-normal variables;
    · plots for monitoring the convergence of data augmentation; and
    · a utility for combining the results of a multiply-imputed data analysis, using Rubin’s (1987) rules for scalar estimands, to produce overall estimates and standard errors that incorporate missing-data uncertainty. A utility for multiparameter inference is also provided.

Computational routines used in NORM are described by Schafer, J. L. (1997) Analysis of Incomplete Multivariate Data (London: Chapman & Hall). 

 

  1. Schafer's S-Plus Packages

The following description is from Schafer, J. L. (1997) Analysis of Incomplete Multivariate Data (London: Chapman & Hall), Appendix C:

The algorithms described in this book have been implemented by the author for general use in the statistical languages S and S-Plus (Becker, Chambers, and Wilks, 1988).  These functions, written with a combination of S and Fortran-77, may be obtained by anyone free of charge.  Three packages are available:

  1. NORM:  algorithms based on the multivariate normal model.
  2. CAT:  algorithms for multivariate categorical data based on the saturated multinomial model and loglinear models.
  3. MIX:  algorithms for mixed continuous and categorical data based on the general location model.

The packages, including source code and full documentation, may be downloaded from the ftp server at the Department of Statistics, The Pennsylvania State University.  To obtain copies via the World Wide Web, connect to www.stat.psu.edu/~jls/ and follow the on-line instructions.

  1. King's $ {\mathfrak{A}melia}$ Program

The following description is from King's online documentation:

$ {\mathfrak{A}melia}$ implements the statistical procedures for analyzing incomplete multivariate data developed in

Gary King, James Honaker, Anne Joseph, and Kenneth Scheve. "Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation." American Political Science Review, Vol. 95, No. 1 (March, 2001): Pp. 49-69, copy available at http://GKing.Harvard.Edu.
Please read this paper before using $ {\mathfrak{A}melia}$. The paper proposes, and this program implements, a remedy to the discrepancy between the way social scientists analyze data with missing values and the recommendations of the statistics community. With a few notable exceptions, statisticians and methodologists have agreed on a widely applicable approach to many missing data problems based on the concept of "multiple imputation," but most social scientists still use listwise deletion (deleting all cases with at least one missing cell) to make inferences in the presence of missing data. This practice is always inefficient and often biased. The various other ad hoc methods available in commerically available statistical software (such as pairwise deletion, imputation from regressions, mean substitution, etc.) are no better. As it turns out, the failure to use superior methods has been largely due to the fact that the computational algorithms available to implement multiple imputation models have been slow, very difficult to use even for experts, and impossible to run with existing commercial software. In the paper, an existing algorithm is adapted for use as a general purpose, multiple imputation model for missing data. This algorithm, called EMis, is between dozens and hundreds of times faster than the leading method recommended in the statistics literature, gives the same answer, and requires no special expertise to use. $ {\mathfrak{A}melia}$: A Program for Missing Data implements the EMis algorithm and thus offers a superior and easy-to-use alternative for statistical analyses of incomplete multivariate data.

 

  1. SOLASTM
 

From the SOLAS website www.statsol.ie/solas/solas.htm.

SOLAS™ is developed in close collaboration with Prof. Donald B. Rubin, the leading authority on Multiple Imputation.

SOLAS™ 3.0 for Missing Data Analysis offers principled approaches to missing data now has its own scripting language and features a choice of 6 imputation techniques, including 2 Multiple Imputation techniques based on the work of Prof. Donald B. Rubin. Data can be imported from a wide variety of file types including SAS (Unix/Windows), SPSS, Splus, Stata and many more. Once the data is imported, the missing data pattern can be displayed and a decision upon the most appropriate technique made. Once imputation is complete the imputed datasets can be analysed within SOLAS or exported to a variety of other packages in the correct format. It's that simple!

"Solas is currently the only program that implements multiple imputation noniteratively and with substantial flexibility, even including ad-hoc methods, such as LOCF, as points of comparison for sensitivity analysis."
Prof. Donald B. Rubin, Harvard.

    The incorrect analysis of datasets with incomplete data can lead to biased analysis and incorrect inference. SOLAS™ 3.0 provides researchers with a range of imputation approaches in an easy to use, validated software package that includes principled, informed solutions to the problems presented by incomplete datasets.

 

 
 
© Copyright 2000 - 2008 Strategic Matching, Inc. All rights reserved. Microsoft, Windows, and Access are trademarks of Microsoft Corporation. Last modified: Monday January 28, 2008.