CODES 2000 User Forum -- Data Network Note #14
Procedures for Multiple Imputation (Phase II)

Applies to: SAS 8.1 & 8.2.
Last updated: Tuesday April 23, 2002.

SUMMARY
Several steps are required to perform multiple imputation of missing
values and regression analysis using SAS procedures for Phase II of the
Helmet Use Study. Data Network Note #7 described the procedures used in Phase I. This Note revises the earlier procedures based on experience
gained in Phase I.

PROCEDURES
- Concatenate separate tables of imputed linked pairs.
The CODES 2000 Linkage Imputation Wizard produces a separate one-to-one linked pairs
table for each imputation. These tables are located in the link database
for your Crash to Hospital Inpatient linkage. The database is named
CrashHospital2.mdb if you used the default link ID.
The simplest way to use these tables is to
concatenate them before joining your Crash to Hospital Inpatient tables. One way to concatenate the tables is with a UNION
query. Here is an Access query named 'qryLinkedPairs1to10' that concatenates
ten separate linkage tables:
SELECT * FROM LinkedPairs1 UNION SELECT * FROM LinkedPairs2 UNION
SELECT * FROM LinkedPairs3 UNION SELECT * FROM LinkedPairs4 UNION SELECT * FROM
LinkedPairs5 UNION SELECT * FROM LinkedPairs6 UNION SELECT * FROM
LinkedPairs7 UNION
SELECT * FROM LinkedPairs8 UNION SELECT * FROM LinkedPairs9 UNION SELECT * FROM
LinkedPairs10;
This query should be created in the link database. To create a UNION query in Access
you must use SQL view, not Design view. If you performed a different number of
imputations then you must modify this query accordingly.
- Join Crash to Hospital Inpatient for imputed linked pairs.
Here is an Access query named 'qryMCMI' that joins Crash to
Hospital Inpatient for all motorcycle inpatients discharged alive. The query
converts available data values into data values that can be used for multiple
imputation and analysis:
SELECT [qryLinkedPairs1to10].[Imputation],
IIf(IsNull([Crash].[HourTruncate]),-1,[Crash].[HourTruncate]) AS Hour,
IIf(IsNull([Crash].[BestAge]),-1,[Crash].[BestAge]) AS Age,
SexNumber([Hospital].[StdSex]) AS Sex,
YesNumber([Crash].[Safety1]) AS Helmet,
ChargesNotZero([Hospital].[Charges]) AS Charges,
[Crash].[Vehicle1] as Vehicle
FROM ([Crash] INNER JOIN [qryLinkedPairs1to10]
ON [Crash].[CODESNbr]=[qryLinkedPairs1to10].[CODESNbr])
INNER JOIN [Hospital]
ON [qryLinkedPairs1to10].[CODESNbr_B]=[Hospital].[CODESNbr]
WHERE ((([Crash].[Vehicle1])='MC') And (([Hospital].[Death])='N'));
This query should be created in the link database. The query uses the UNION query
you created
in Step 1 to join the Crash and Hospital Inpatient tables. Note that the
query uses the native Access IIf function to test for Null values
and replace them with -1. You cannot use Access Null values directly in SAS
datasets -- they must be converted to 'Missing'. The conversion functions
SexNumber, YesNumber, and ChargesNotZero are custom Visual Basic functions.
Send email if you need help developing such functions. You will have to modify the
query shown here if your tables, fields, or functions have different names.
The last field selected in this query is vehicle type. This text field is
not needed for multiple imputation of missing values. However, the SAS
DDE connection described in the next step might treat the last record from the
Access query as a 'lost card' if the last field is numeric. Consequently,
the
dataset created by SAS may not include the final record unless you have a
text field as the last field in the Access select query or table. This
is a known bug with the Access to SAS DDE interface.
- Import CODES 2000 data into a SAS dataset.
This SAS data step converts the output from the Access query qryMCMI to
a SAS dataset called MCData using a DDE connection to the Access
database. The Access database must be open when you run this step.
Here -1 values are converted to missing values (.) and Hour and
Charges values are transformed for greater normality.
FILENAME
MCData DDE 'MSAccess|CrashHospital2.mdb;QUERY
qryMCMI!Data';
title 'MI
Helmet Use Effect Phase II';
data
MCData;
infile
MCData;
input
Imputation Hour Age Sex Helmet Charges Vehicle;
if Hour =
-1 then Hour
= .;
if Age = -1
then Age = .;
if Sex = -1
then Sex = .;
if Helmet
= -1 then
Helmet = .;
if Charges
= -1 then
Charges = .;
if Charges
= 0 then
Charges = .;
if
Hour ^= . then
do;
if Hour < 4 then
Hour4 = Hour + 24;
else Hour4=Hour;
Hour4 = Hour4 - 4;
end;
LogChgs = log(Charges);
;
Utah CODES suggests using SAS OLEDB as
an alternative to SAS DDE for reading data from Access tables and
queries. You must have the SAS ACCESS option installed for this to
work. Start your SAS script with the following:
libname CODES oledb;
run;
This will bring up a series of dialog
boxes - select the Jet OLE driver, and then you will get asked to select the Access
database. Once that is open, use a dot syntax to access any part of that
database - Access does NOT need to be open, nor do Null values have to be
replaced with other values in the Access join query or table you create.
Also, OLEDB does not have the lost card problem associated with DDE. For
example:
data
MCData;
set
CODES.qryMCDataWithNulls;
if
Hour ^= . then
do;
if Hour < 4 then
Hour4 = Hour + 24;
else Hour4=Hour;
Hour4 = Hour4 - 4;
end;
LogChgs = log(Charges);
;
- Multiply impute missing values using SAS MI procedure.
This SAS MI procedure imputes 10 complete datasets and stores the results in the outmi SAS dataset for
analysis. It is similar to the procedure presented in Note #7 except that the number of imputations is larger
(10
instead of 5) and the imputation model now includes Sex because we plan to use
Sex as a covariate in the regression analysis. Note that in Phase II we are
performing 10 separate executions of the SAS MI procedure, once for each
linkage imputation. In Phase I we executed MI once for a table that
contained all linkage imputations as a single set of observations.
proc
sort data=MCData;
by
Imputation;
run;
proc mi
data=MCData
seed=37851
out=outmi
nimpute=10
round=1
;
mcmc
nbiter=5000
niter=2000
;
title2 'MI
10 X 10';
by
Imputation;
var Age
Sex Hour4 Helmet LogChgs;
run;
- Prepare imputed values for analysis.
This SAS data step corrects out of range imputed values and reverse
transforms imputed values.
data
MI100;
set outmi;
_Imputation_ = _Imputation_ + 10*(Imputation
- 1);
Helmet = max(Helmet,0);
Helmet = min(Helmet,1);
Sex = max(Sex,0);
Sex = min(Sex,1);
Hour = Hour4;
if Hour
> 23 then
Hour=Hour-4;
Hour = max(Hour,0);
Hour = min(Hour,23);
Age = max(Age,0);
Charges = exp(LogChgs);
run;
proc
print data=MI100 (obs=20);
run;
The _Imputation_ calculation is required because the input dataset contains
10 complete linkage imputations. The final result is 100 imputations: 10 value imputations for each of
10 linkage
imputations. The SAS print procedure prints a small sample of the first
complete dataset for your review.
- Analyze imputed values.
These SAS procedures perform a separate regression analysis for each of the
100 separate imputations. The regression model used here to
summarize the relationship between inpatient charges and helmet use includes
patient age and sex as covariates. This model is more complete than the very
simple model used in Phase I.
proc
sort data=MI100;
by
_Imputation_;
run;
proc reg
data=MI100 outest=outreg
covout noprint;
by
_Imputation_;
model
Charges = Age Sex Helmet;
run;
proc
print data=outreg
(where = (_TYPE_ = 'PARMS'));
var
_Imputation_ _DEPVAR_ Intercept Age Sex Helmet;
run;
Results of the 100 regression analyses are stored in the outreg SAS
dataset. The SAS print procedure prints key parameter estimates from the
regression results.
- Combine separate regression parameters into a single estimate.
This SAS MIANALYZE procedure combines parameter estimates from 100 separate
regression analysis to produce final parameter estimates and error estimates.
proc
mianalyze data=outreg;
var
Intercept Age Sex Helmet;
run;
- Test the sensitivity of your results to methodology options.
You have several choices here. Compare results from different sets of
linkage imputations. For example, linkage imputations 1 to 5 versus 6 to
10 or 1 to 5 versus 1 to 10. Or, compare results from different sets of
value imputations. For example, nimpute = 5 versus 10, or seed = 37851
versus 55417, or niter = 2000 versus 5000, or var LogChgs versus
Charges. See Data Network Note #8 -- Variations in
Multiple Imputation Results.
SAS is a registered trademark of SAS Institute, Inc.