CODES 2000 User Forum -- Data Network Note #14

Procedures for Multiple Imputation (Phase II)

Applies to: SAS 8.1 & 8.2.
Last updated: Tuesday April 23, 2002.

SUMMARY

Several steps are required to perform multiple imputation of missing values and regression analysis using SAS procedures for Phase II of the Helmet Use Study. Data Network Note #7 described the procedures used in Phase I. This Note revises the earlier procedures based on experience gained in Phase I.

PROCEDURES

  1. Concatenate separate tables of imputed linked pairs.

The CODES 2000 Linkage Imputation Wizard produces a separate one-to-one linked pairs table for each imputation. These tables are located in the link database for your Crash to Hospital Inpatient linkage.  The database is named CrashHospital2.mdb if you used the default link ID.

The simplest way to use these tables is to concatenate them before joining your Crash to Hospital Inpatient tables. One way to concatenate the tables is with a UNION query. Here is an Access query named 'qryLinkedPairs1to10' that concatenates ten separate linkage tables:

SELECT * FROM LinkedPairs1 UNION SELECT * FROM LinkedPairs2 UNION
SELECT * FROM LinkedPairs3 UNION SELECT * FROM LinkedPairs4 UNION  SELECT * FROM LinkedPairs5 UNION SELECT * FROM LinkedPairs6 UNION  SELECT * FROM LinkedPairs7 UNION SELECT * FROM LinkedPairs8 UNION  SELECT * FROM LinkedPairs9 UNION SELECT * FROM LinkedPairs10;

This query should be created in the link database. To create a UNION query in Access you must use SQL view, not Design view. If you performed a different number of imputations then you must modify this query accordingly. 

  1. Join Crash to Hospital Inpatient for imputed linked pairs.

Here is an Access query named 'qryMCMI' that joins Crash to Hospital Inpatient for all motorcycle inpatients discharged alive. The query converts available data values into data values that can be used for multiple imputation and analysis:

SELECT [qryLinkedPairs1to10].[Imputation],

IIf(IsNull([Crash].[HourTruncate]),-1,[Crash].[HourTruncate]) AS Hour,

IIf(IsNull([Crash].[BestAge]),-1,[Crash].[BestAge]) AS Age,

SexNumber([Hospital].[StdSex]) AS Sex,

YesNumber([Crash].[Safety1]) AS Helmet,

ChargesNotZero([Hospital].[Charges]) AS Charges,

[Crash].[Vehicle1] as Vehicle

FROM ([Crash] INNER JOIN [qryLinkedPairs1to10] 

ON [Crash].[CODESNbr]=[qryLinkedPairs1to10].[CODESNbr]) 

INNER JOIN [Hospital] 

ON [qryLinkedPairs1to10].[CODESNbr_B]=[Hospital].[CODESNbr]

WHERE ((([Crash].[Vehicle1])='MC') And (([Hospital].[Death])='N'));

This query should be created in the link database. The query uses the UNION query you created in Step 1 to join the Crash and Hospital Inpatient tables.  Note that the query uses the native Access IIf function to test for Null values and replace them with -1. You cannot use Access Null values directly in SAS datasets -- they must be converted to 'Missing'. The conversion functions SexNumber, YesNumber, and ChargesNotZero are custom Visual Basic functions. Send email if you need help developing such functions. You will have to modify the query shown here if your tables, fields, or functions have different names.

The last field selected in this query is vehicle type. This text field is not needed for multiple imputation of missing values. However, the SAS DDE connection described in the next step might treat the last record from the Access query as a 'lost card' if the last field is numeric. Consequently, the dataset created by SAS may not include the final record unless you have a text field as the last field in the Access select query or table.  This is a known bug with the Access to SAS DDE interface.

  1. Import CODES 2000 data into a SAS dataset.

This SAS data step converts the output from the Access query qryMCMI to a SAS dataset called MCData using a DDE connection to the Access database.  The Access database must be open when you run this step.  Here -1 values are converted to missing values (.) and Hour and Charges values are transformed for greater normality.

FILENAME MCData DDE 'MSAccess|CrashHospital2.mdb;QUERY qryMCMI!Data';

title 'MI Helmet Use Effect Phase II';

data MCData;

infile MCData;

input Imputation Hour Age Sex Helmet Charges Vehicle;

if Hour = -1 then Hour = .;

if Age = -1 then Age = .;

if Sex = -1 then Sex = .;

if Helmet = -1 then Helmet = .;

if Charges = -1 then Charges = .;

if Charges = 0 then Charges = .;

if Hour ^= . then do;

    if Hour < 4 then Hour4 = Hour + 24;

    else Hour4=Hour;

    Hour4 = Hour4 - 4;

    end;

LogChgs = log(Charges);

;

Utah CODES suggests using SAS OLEDB as an alternative to SAS DDE for reading data from Access tables and queries.  You must have the SAS ACCESS option installed for this to work.  Start your SAS script with the following:

libname CODES oledb;

run;

This will bring up a series of dialog boxes - select the Jet OLE driver, and then you will get asked to select the Access database.  Once that is open, use a dot syntax to access any part of that database - Access does NOT need to be open, nor do Null values have to be replaced with other values in the Access join query or table you create.  Also, OLEDB does not have the lost card problem associated with DDE.  For example:

data MCData;

set CODES.qryMCDataWithNulls;

if Hour ^= . then do;

    if Hour < 4 then Hour4 = Hour + 24;

    else Hour4=Hour;

    Hour4 = Hour4 - 4;

    end;

LogChgs = log(Charges);

;

  1. Multiply impute missing values using SAS MI procedure.

This SAS MI procedure imputes 10 complete datasets and stores the results in the outmi SAS dataset for analysis.  It is similar to the procedure presented in Note #7 except that the number of imputations is larger (10 instead of 5) and the imputation model now includes Sex because we plan to use Sex as a covariate in the regression analysis. Note that in Phase II we are performing 10 separate executions of the SAS MI procedure, once for each linkage imputation.  In Phase I we executed MI once for a table that contained all linkage imputations as a single set of observations. 

proc sort data=MCData;

by Imputation;

run;

proc mi data=MCData 

seed=37851  

out=outmi 

nimpute=10 

round=1

;

mcmc

nbiter=5000

niter=2000

;

title2 'MI 10 X 10';

by Imputation;

var Age Sex Hour4 Helmet LogChgs;

run;

  1. Prepare imputed values for analysis.

This SAS data step corrects out of range imputed values and reverse transforms imputed values.

data MI100;

set outmi;

_Imputation_ = _Imputation_ + 10*(Imputation - 1);

Helmet = max(Helmet,0);

Helmet = min(Helmet,1);

Sex = max(Sex,0);

Sex = min(Sex,1);

Hour = Hour4;

if Hour > 23 then Hour=Hour-4;

Hour = max(Hour,0);

Hour = min(Hour,23);

Age = max(Age,0);

Charges = exp(LogChgs);

run;

proc print data=MI100 (obs=20);

run;

The _Imputation_ calculation is required because the input dataset contains 10 complete linkage imputations.  The final result is 100 imputations: 10 value imputations for each of 10 linkage imputations.  The SAS print procedure prints a small sample of the first complete dataset for your review.

  1. Analyze imputed values.

These SAS procedures perform a separate regression analysis for each of the 100 separate imputations.  The regression model used here to summarize the relationship between inpatient charges and helmet use includes patient age and sex as covariates. This model is more complete than the very simple model used in Phase I. 

proc sort data=MI100;

by _Imputation_;

run;

proc reg data=MI100 outest=outreg covout noprint;

by _Imputation_;

model Charges = Age Sex Helmet;

run;

proc print data=outreg (where = (_TYPE_ = 'PARMS'));

var _Imputation_ _DEPVAR_ Intercept Age Sex Helmet;

run;

Results of the 100 regression analyses are stored in the outreg SAS dataset.  The SAS print procedure prints key parameter estimates from the regression results. 

  1. Combine separate regression parameters into a single estimate.

This SAS MIANALYZE procedure combines parameter estimates from 100 separate regression analysis to produce final parameter estimates and error estimates.

proc mianalyze data=outreg;

var Intercept Age Sex Helmet;

run;

  1. Test the sensitivity of your results to methodology options.

You have several choices here.  Compare results from different sets of linkage imputations.  For example, linkage imputations 1 to 5 versus 6 to 10 or 1 to 5 versus 1 to 10.  Or, compare results from different sets of value imputations.  For example, nimpute = 5 versus 10, or seed = 37851 versus 55417, or niter = 2000 versus 5000, or var LogChgs versus Charges.  See Data Network Note #8 -- Variations in Multiple Imputation Results.

SAS is a registered trademark of SAS Institute, Inc.

 
© Copyright 2000 - 2008 Strategic Matching, Inc. All rights reserved. Microsoft, Windows, and Access are trademarks of Microsoft Corporation. Last modified: Monday January 28, 2008.