CODES 2000 User Forum -- Data Network Note #1
Sources for Multiple Imputation Software

Applies to: Tutorial Nov 6-7, 2001.
Last updated: Friday October 26, 2001.

SUMMARY
Several sources are available for multiple imputation software. In
general, different imputation algorithms are needed for different commonly used data
models: multivariate normal, categorical, or mixed. Not all software
programs support all models and not all programs provide the same analysis
features for the models they do support. The multivariate normal model is
supported by all software listed here.

SOFTWARE SOURCES
- SAS/STAT MI Procedure
Available from SAS Institute, Inc as an experimental procedure.
SAS and SAS/STAT are trademarks of SAS Institute, Inc.
The following description is from the SAS online documentation for the MI
Procedure at www.sas.com//service/library/onlinedoc/v82/whatsnew:
Multiple imputation is a strategy for dealing with data sets with missing
values. You replace each missing value with a set of plausible values that
represent the uncertainty about the right value to impute. You create
multiply imputed data sets, analyze them with standard analyses, and then
combine the results. You produce valid statistical inferences that properly
reflect the uncertainty due to the missing values.
The MI procedure creates multiple imputed data sets for incomplete p-dimensional
multivariate data. It offers three methods for creating the imputed data
sets: the regression method, the propensity score method, and the Markov
Chain Monte Carlo (MCMC) method. The procedure creates an output data set
containing m imputed versions of the original data. In each version,
the missing values are replaced with imputed values. For the MCMC method,
you can specify whether you want a single chain for all m imputations
or a separate chain for each imputation. You can also specify the initial
estimates for the MCMC method. After analyzing your imputed data with
standard procedures, you use the MIANALYZE procedure to combine the results.
The MI procedure was introduced in Release 8.1 and remains experimental
in Release 8.2, with various new options and output displays available.
Among others, a new TRANSFORM statement enables you to transform variables
before imputation and back-transform these variables before combining
inferences and creating output data sets.
- Schafer's Stand-alone NORM Program
Available for free from Schafer's web site www.stat.psu.edu/~jls/.
The following description is from the NORM help manual:
NORM is a Windows 95/98/NT program for multiple imputation (MI) of incomplete multivariate data. Its name refers to the multivariate normal distribution, the model used to generate the imputations. The main procedures in NORM are:
· an EM algorithm for efficient estimation of mean, variances, and covariances (or correlations); and
· a data augmentation procedure for generating multiple imputations of missing values.
Additional features include:
· pre- and post-imputation processing of data (transformations, rounding, etc.), which can be helpful for imputing certain kinds of non-normal variables;
· plots for monitoring the convergence of data augmentation; and
· a utility for combining the results of a multiply-imputed data analysis, using Rubin’s (1987) rules for scalar estimands, to produce overall estimates and standard errors that incorporate missing-data uncertainty. A utility for multiparameter inference is also provided.
Computational routines used in NORM are described by Schafer, J. L. (1997) Analysis of Incomplete Multivariate Data (London: Chapman & Hall).
- Schafer's S-Plus Packages
The following description is from Schafer, J. L. (1997) Analysis of Incomplete Multivariate Data (London: Chapman &
Hall), Appendix C:
The algorithms described in this book have been implemented by the author
for general use in the statistical languages S and S-Plus (Becker, Chambers,
and Wilks, 1988). These functions, written with a combination of S and
Fortran-77, may be obtained by anyone free of charge. Three packages are
available:
- NORM: algorithms based on the multivariate normal model.
- CAT: algorithms for multivariate categorical data based on the
saturated multinomial model and loglinear models.
- MIX: algorithms for mixed continuous and categorical data based on
the general location model.
The packages, including source code and full documentation, may be
downloaded from the ftp server at the Department of Statistics, The
Pennsylvania State University. To obtain copies via the World Wide Web,
connect to www.stat.psu.edu/~jls/
and follow the on-line instructions.
- King's
Program
The following description is from King's online documentation:
implements the statistical procedures for analyzing incomplete multivariate
data developed in
Gary King, James Honaker, Anne Joseph, and Kenneth Scheve. "Analyzing
Incomplete Political Science Data: An Alternative Algorithm for Multiple
Imputation." American Political Science Review, Vol. 95, No.
1 (March, 2001): Pp. 49-69, copy available at http://GKing.Harvard.Edu.
Please read this paper before using
.
The paper proposes, and this program implements, a remedy to the discrepancy
between the way social scientists analyze data with missing values and the
recommendations of the statistics community. With a few notable exceptions,
statisticians and methodologists have agreed on a widely applicable approach
to many missing data problems based on the concept of "multiple
imputation," but most social scientists still use listwise deletion
(deleting all cases with at least one missing cell) to make inferences in the
presence of missing data. This practice is always inefficient and often
biased. The various other ad hoc methods available in commerically available
statistical software (such as pairwise deletion, imputation from regressions,
mean substitution, etc.) are no better. As it turns out, the failure to use
superior methods has been largely due to the fact that the computational
algorithms available to implement multiple imputation models have been slow,
very difficult to use even for experts, and impossible to run with existing
commercial software. In the paper, an existing algorithm is adapted for use as
a general purpose, multiple imputation model for missing data. This algorithm,
called EMis, is between dozens and hundreds of times faster than the
leading method recommended in the statistics literature, gives the same
answer, and requires no special expertise to use.
:
A Program for Missing Data implements the EMis algorithm and thus offers a
superior and easy-to-use alternative for statistical analyses of incomplete
multivariate data.
- SOLASTM
From the SOLAS website www.statsol.ie/solas/solas.htm.
SOLAS™ is developed
in close collaboration with Prof. Donald B. Rubin, the leading
authority on Multiple Imputation.
SOLAS™ 3.0 for Missing Data Analysis offers principled approaches
to missing data now has its own scripting language and features a choice of 6
imputation techniques, including 2 Multiple Imputation techniques based on the
work of Prof. Donald B. Rubin. Data can be imported from a wide variety of
file types including SAS (Unix/Windows), SPSS, Splus, Stata and many more.
Once the data is imported, the missing data pattern can be displayed and a
decision upon the most appropriate technique made. Once imputation is complete
the imputed datasets can be analysed within SOLAS or exported to a variety of
other packages in the correct format. It's that simple!
"Solas is currently the only program
that implements multiple imputation noniteratively and with substantial
flexibility, even including ad-hoc methods, such as LOCF, as points of
comparison for sensitivity analysis."
Prof. Donald B. Rubin, Harvard.
The incorrect analysis of datasets with incomplete
data can lead to biased analysis and incorrect inference. SOLAS™ 3.0
provides researchers with a range of imputation approaches in an easy to use,
validated software package that includes principled, informed solutions to the
problems presented by incomplete datasets.
|