CODES 2000 User Forum -- Data Network Note #7

SAS Procedures for Multiple Imputation (Continued)

Applies to: SAS 8.1 & 8.2.
Last updated: Monday November 19, 2001.

SUMMARY

Several steps are required to perform multiple imputation of missing values using SAS procedures.  Note #4 described the procedures used for the multiple imputation examples presented at the CODES Data Network tutorial.  This Note revises and expands that earlier material based on feedback from Utah CODES (thanks to Larry, Stacey, and Mike) and additional testing of the SAS MI procedure.

PROCEDURE

  1. Concatenate separate imputed linkage tables.

The CODES 2000 Linkage Imputation Wizard (frmImpute) produces a separate linked pairs table for each imputation.  The simplest way to use these tables is to concatenate them before joining Crash to Hospital.  One way to concatenate the tables is with a UNION query.  Here is an Access query named 'qryUnion1to5' suggested by Utah CODES that concatenates five separate linkage tables:

SELECT * FROM LinkedPairs1 UNION SELECT * FROM LinkedPairs2 UNION
SELECT * FROM LinkedPairs3 UNION SELECT * FROM LinkedPairs4 UNION SELECT * FROM LinkedPairs5;

You must create any UNION query in Access SQL view, not Design view.

  1. Join Crash to Hospital for imputed linked pairs.

Here is an Access query named 'qryMCInpatients' that joins Crash to Hospital for all motorcycle inpatients discharges alive:

SELECT qryUnion1to5.Imputation, IIf(IsNull(Crash.Hour),-1,Crash.Hour) AS Hour, IIf(IsNull(Crash.Age),-1,Crash.Age) AS Age, IIf(IsNull(Crash.Safety),-1,Crash.Safety) AS Helmet, IIf(IsNull(Hospital.Charges),-1,Crash.Hospital.Charges) AS Charges, Crash.Vehicle as Vehicle
FROM (Crash INNER JOIN qryUnion1to5 ON Crash.CODESNbr = qryUnion1to5.CODESNbr) INNER JOIN Hospital ON qryUnion1to5.CODESNbr_B = Hospital.CODESNbr
WHERE Crash.Vehicle='MC' AND Hospital.Death='N';

This query is similar to the query named 'qryMCMINorm' presented in Data Network Technical Note #4.  The new query uses the UNION query presented in Step 1 to join the Crash and Hospital test tables described in Note #3.  It also uses the native Access IIf function to test for Null values and replace them with -1.

The last field selected in this query is Vehicle.  This text field is not needed for multiple imputation of missing values.  However, the SAS DDE connection described in the next step treats the last record from the Access query as a 'lost card' when the last field is numeric.  The dataset created by SAS will not include the final record unless you have a text field as the last field in the Access select query or table.  If you have additional information about how to correct this problem please notify Mike McGlincy.

  1. Import CODES 2000 data into a SAS dataset.

This SAS data step converts the output from the Access query qryMCInpatients to a SAS dataset called MCData using a DDE connection to the Access database.  The Access database must be open when you run this step.  Note that -1 values are converted to missing values (.) and that Hour and Charges values are transformed for greater normality.

FILENAME MCData DDE 'MSAccess|CrashHospital2.mdb;QUERY qryMCInpatients!Data';

data MCData;

infile MCData;

input Imputed Hour Age Helmet Charges Vehicle;

if Hour = -1 then Hour = .;

if Age = -1 then Age = .;

if Helmet = -1 then Helmet = .;

if Charges = -1 then Charges = .;

Hour4 = Hour;

if Hour4 < 4 then Hour4=Hour4+24;

LogChgs = log(Charges);

;

Utah CODES suggests using SAS OLEDB as an alternative to SAS DDE for reading data from Access tables and queries.  You must have the SAS ACCESS option installed for this to work.  Start your SAS script with the following:

libname CODES oledb;

run;

This will bring up a series of dialog boxes - select the Jet OLE driver, and then you will get asked to select the Access database.  Once that is open, use a dot syntax to access any part of that database - Access does NOT need to be open, nor do Null values have to be replaced with other values in the Access join query or table you create.  Also, OLEDB does not have the lost card problem associated with DDE.  For example:

data MCData;

set CODES.qryMCDataWithNulls;

Hour4 = Hour;

if Hour4 < 4 then Hour4=Hour4+24;

LogChgs = log(Charges);

;

  1. Multiply impute missing values using SAS MI procedure.

This SAS MI procedure imputes 20 complete datasets and stores the results in the outmi SAS dataset for analysis.  It is similar to the procedure used for the tutorial and presented in Note #4 except that the number of imputations is larger (20 instead of 5) and the number of iterations for the MCMC algorithm are much larger (5000, 2000 instead of 200, 100).  Both of these changes are to address MCMC convergence issues, which will be discussed in more detail in Note #8.

proc mi data=MCData 

seed=37851  

out=outmi 

nimpute=20 

round=1

mcmc

nbiter=5000

niter=2000

;

title 'MI Helmet Probability Model 5';

var LogChgs Age Hour4 Helmet;

run;

  1. Prepare imputed values for analysis.

This SAS data step corrects out of range imputed values and reverse transforms imputed values.

data MI25;

set outmi;

_Imputation_ = _Imputation_ + 20*(Imputed - 1);

Helmet = max(Helmet,0);

Helmet = min(Helmet,1);

Hour = Hour4;

if Hour > 23 then Hour=Hour-4;

Hour = max(Hour,0);

Hour = min(Hour,23);

Age = max(Age,0);

Charges = exp(LogChgs);

run;

proc print data=MI25 (obs=20);

run;

The imputation calculation is required because the input table contains five complete linkage imputations.  The final result is 100 imputations:  20 value imputations for each of 5 linkage imputations.  The SAS print procedure prints a small sample of the first complete dataset.

  1. Analyze imputed values.

These SAS procedures perform a separate regression analysis for each of the 100 separate imputations.  A very simple regression model is used here to summarize the relationship between inpatient charges and helmet use. 

proc sort data=MI25;

by _Imputation_;

run;

proc reg data=MI25 outest=outreg covout noprint;

by _Imputation_;

model Charges = Helmet;

run;

proc print data=outreg (where = (_TYPE_ = 'PARMS'));

var _Imputation_ _DEPVAR_ Intercept Helmet;

run;

Results of the 100 regression analyses are stored in the outreg SAS dataset.  The SAS print procedure prints key parameter estimates from the regression results. 

  1. Combine separate regression parameters into a single estimate.

This SAS MIANALYZE procedure combines parameter estimates from 100 separate regression analysis to produce final parameter estimates and error estimates.

proc mianalyze data=outreg;

var Intercept Helmet;

run;

SAS is a registered trademark of SAS Institute, Inc.

 
© Copyright 2000 - 2008 Strategic Matching, Inc. All rights reserved. Microsoft, Windows, and Access are trademarks of Microsoft Corporation. Last modified: Monday January 28, 2008.