CODES 2000 User Forum -- Data Network Note #7
SAS Procedures for Multiple Imputation (Continued)

Applies to: SAS 8.1 & 8.2.
Last updated: Monday November 19, 2001.

SUMMARY
Several steps are required to perform multiple imputation of missing
values using SAS
procedures. Note #4 described the procedures used for the multiple imputation
examples presented at the CODES Data Network tutorial. This Note revises
and expands that earlier material based on feedback from Utah CODES (thanks to
Larry, Stacey, and Mike) and additional testing of the SAS MI procedure.

PROCEDURE
- Concatenate separate imputed linkage tables.
The CODES 2000 Linkage Imputation Wizard (frmImpute) produces a separate linked pairs
table for each imputation. The simplest way to use these tables is to
concatenate them before joining Crash to Hospital. One way to concatenate the tables is with a UNION
query. Here is an Access query named 'qryUnion1to5' suggested by Utah
CODES that concatenates five separate linkage tables:
SELECT * FROM LinkedPairs1 UNION SELECT * FROM LinkedPairs2 UNION
SELECT * FROM LinkedPairs3 UNION SELECT * FROM LinkedPairs4 UNION SELECT * FROM LinkedPairs5;
You must create any UNION query in Access SQL view, not Design view.
- Join Crash to Hospital for imputed linked pairs.
Here is an Access query named 'qryMCInpatients' that joins Crash to
Hospital for all motorcycle inpatients discharges alive:
SELECT qryUnion1to5.Imputation, IIf(IsNull(Crash.Hour),-1,Crash.Hour) AS Hour,
IIf(IsNull(Crash.Age),-1,Crash.Age) AS Age, IIf(IsNull(Crash.Safety),-1,Crash.Safety)
AS Helmet, IIf(IsNull(Hospital.Charges),-1,Crash.Hospital.Charges) AS Charges,
Crash.Vehicle as Vehicle
FROM (Crash INNER JOIN qryUnion1to5 ON Crash.CODESNbr = qryUnion1to5.CODESNbr) INNER JOIN
Hospital ON qryUnion1to5.CODESNbr_B = Hospital.CODESNbr
WHERE Crash.Vehicle='MC' AND Hospital.Death='N';
This query is similar to the query named 'qryMCMINorm' presented in Data
Network Technical Note #4. The new query uses the UNION query presented
in Step 1 to join the Crash and Hospital test tables described in Note
#3. It also uses the native Access IIf function to test for Null values
and replace them with -1.
The last field selected in this query is Vehicle. This text field is
not needed for multiple imputation of missing values. However, the SAS
DDE connection described in the next step treats the last record from the
Access query as a 'lost card' when the last field is numeric. The
dataset created by SAS will not include the final record unless you have a
text field as the last field in the Access select query or table. If you
have additional information about how to correct this problem please notify
Mike McGlincy.
- Import CODES 2000 data into a SAS dataset.
This SAS data step converts the output from the Access query
qryMCInpatients to
a SAS dataset called MCData using a DDE connection to the Access
database. The Access database must be open when you run this step.
Note that -1 values are converted to missing values (.) and that Hour and
Charges values are transformed for greater normality.
FILENAME
MCData DDE 'MSAccess|CrashHospital2.mdb;QUERY
qryMCInpatients!Data';
data
MCData;
infile
MCData;
input
Imputed Hour Age Helmet Charges Vehicle;
if Hour =
-1 then Hour
= .;
if Age = -1
then Age = .;
if Helmet
= -1 then
Helmet = .;
if Charges
= -1 then
Charges = .;
Hour4 = Hour;
if Hour4
< 4 then
Hour4=Hour4+24;
LogChgs = log(Charges);
;
Utah CODES suggests using SAS OLEDB as
an alternative to SAS DDE for reading data from Access tables and
queries. You must have the SAS ACCESS option installed for this to
work. Start your SAS script with the following:
libname CODES oledb;
run;
This will bring up a series of dialog
boxes - select the Jet OLE driver, and then you will get asked to select the Access
database. Once that is open, use a dot syntax to access any part of that
database - Access does NOT need to be open, nor do Null values have to be
replaced with other values in the Access join query or table you create.
Also, OLEDB does not have the lost card problem associated with DDE. For
example:
data
MCData;
set
CODES.qryMCDataWithNulls;
Hour4 = Hour;
if Hour4
< 4 then
Hour4=Hour4+24;
LogChgs = log(Charges);
;
- Multiply impute missing values using SAS MI procedure.
This SAS MI procedure imputes 20 complete datasets and stores the results in the outmi SAS dataset for
analysis. It is similar to the procedure used for the tutorial and
presented in Note #4 except that the number of imputations is larger (20
instead of 5) and the number of iterations for the MCMC algorithm are much
larger (5000, 2000 instead of 200, 100). Both of these changes are to
address MCMC convergence issues, which will be discussed in more detail in
Note #8.
proc mi
data=MCData
seed=37851
out=outmi
nimpute=20
round=1
mcmc
nbiter=5000
niter=2000
;
title 'MI
Helmet Probability Model 5';
var
LogChgs Age
Hour4 Helmet;
run;
- Prepare imputed values for analysis.
This SAS data step corrects out of range imputed values and reverse
transforms imputed values.
data
MI25;
set outmi;
_Imputation_ = _Imputation_ + 20*(Imputed
- 1);
Helmet = max(Helmet,0);
Helmet = min(Helmet,1);
Hour = Hour4;
if Hour
> 23 then
Hour=Hour-4;
Hour = max(Hour,0);
Hour = min(Hour,23);
Age = max(Age,0);
Charges = exp(LogChgs);
run;
proc
print data=MI25 (obs=20);
run;
The imputation calculation is required because the input table contains
five complete linkage imputations. The final result is 100 imputations:
20 value imputations for each of 5 linkage
imputations. The SAS print procedure prints a small sample of the first
complete dataset.
- Analyze imputed values.
These SAS procedures perform a separate regression analysis for each of the
100 separate imputations. A very simple regression model is used here to
summarize the relationship between inpatient charges and helmet use.
proc
sort data=MI25;
by
_Imputation_;
run;
proc reg
data=MI25 outest=outreg
covout noprint;
by
_Imputation_;
model
Charges = Helmet;
run;
proc
print data=outreg
(where = (_TYPE_ = 'PARMS'));
var
_Imputation_ _DEPVAR_ Intercept Helmet;
run;
Results of the 100 regression analyses are stored in the outreg SAS
dataset. The SAS print procedure prints key parameter estimates from the
regression results.
- Combine separate regression parameters into a single estimate.
This SAS MIANALYZE procedure combines parameter estimates from 100 separate
regression analysis to produce final parameter estimates and error estimates.
proc
mianalyze data=outreg;
var
Intercept Helmet;
run;
SAS is a registered trademark of SAS Institute, Inc.