CODES 2000 User Forum -- Data Network Note #5

SAS Procedures for A Simple EM Algorithm

Applies to: SAS 8.1 & 8.2.
Last updated: Thursday November 15, 2001.

SUMMARY

The EM algorithm can be used to calculate a maximum likelihood estimate for all missing values in a dataset.  A simple EM algorithm for two binary variables was used for the Data Network tutorial to provide a complete data cross tabulation of reported helmet use by police and EMS.

PROCEDURE

  1. Compare Crash to EMS reported helmet use in CODES 2000.

Using the five imputed complete Crash to Inpatient linkages prepared earlier, compare police reported helmet use to EMS reported helmet use for all inpatients.   

Here is an Access query that counts all agreements and disagreements:

SELECT [Crash].[Vehicle1] AS Vehicle, [HospitalDb].[Death], [Crash].[Safety1] AS CrashHelmet, [Crash].[SafetyEMS] AS EMSHelmet, Count([Crash].[CODESNbr]) AS Freq, CInt(Avg([HospitalDb].[Charges])) AS AvgChgs
FROM (Crash INNER JOIN ImputedMatchPairs1to5 ON [Crash].[CODESNbr]=[ImputedMatchPairs1to5].[CODESNbr]) INNER JOIN HospitalDb ON [ImputedMatchPairs1to5].[CODESNbr_B]=[HospitalDb].[SID]
GROUP BY [Crash].[Vehicle1], [HospitalDb].[Death], [Crash].[Safety1], [Crash].[SafetyEMS]
HAVING (((Crash.Vehicle1)='MC') AND ((HospitalDb.Death)='N'));

  1. Run the EM algorithm starting from the frequency information obtained in step 1.

The following SAS data step implements Schafer's EM algorithm as listed in his Analysis of Incomplete Multivariate Data (1997), pp 42-45.  No cases were observed in the linked crash records for Crash Helmet = "N" and EMS Helmet = "Y" but this is probably not a structural zero because such cases were observed in the complete crash dataset.  Consequently, we added 0.5 to all counts.

 

*-------------- Helmet to Helmet Comparison ---------------*

|       Comparison of Crash vs. EMS reported helmet use.....       |

|       Implements Schafer's EM algorithm (1997, pp 42-45)       |

*------------------------------------------------------*;

Data EM Sim;

*Observed frequencies;

input x00 x01 x02 x10 x11 x12 x20 x21 x22;

array thetaOld[2,2];

array thetaNew[2,2];

array xA[2,2];

array xB[2];

array xC[2];

* Both Y1 and Y2 observed;

xA[1,1] = x11;

xA[1,2] = x12;

xA[2,1] = x21;

xA[2,2] = x22;

* Y2 missing;

xB[1] = x10;

xB[2] = x20;

* Y1 missing;

xC[1] = x01;

xC[2] = x02;

* Cases with both missing are not included because they do not affect the MLE;

n = x01 + x02 + x10 + x11 + x12 + x20 + x21 + x22;

* Prior probabilities uniform = 0.25;

t = 0;

Do i=1 to 2;

Do j=1 to 2;

thetaOld[i,j] = 0.25;

end;

end;

ProbNN = thetaOld[1,1];

ProbNY = thetaOld[1,2];

ProbYN = thetaOld[2,1];

ProbYY = thetaOld[2,2];

* Post t = 0 values;

keep t ProbNN ProbNY ProbYN ProbYY;

output EM;

* Calculate 50 EM iterations;

Do t=1 to 50;

Do i=1 to 2;

Do j=1 to 2;

thetaI = thetaOld[i,1] + thetaOld[i,2];

thetaJ = thetaOld[1,j] + thetaOld[2,j];

B = thetaOld[i,j] / thetaI;

C = thetaOld[i,j] / thetaJ;

thetaNew[i,j] = (1/n) * (xA[i,j] + xB[i] * B + xC[j] * C);

end;

end;

 

Do i=1 to 2;

Do j=1 to 2;

thetaOld[i,j] = thetaNew[i,j];

end;

end;

ProbNN = thetaOld[1,1];

ProbNY = thetaOld[1,2];

ProbYN = thetaOld[2,1];

ProbYY = thetaOld[2,2];

keep t ProbNN ProbNY ProbYN ProbYY;

output EM;

end;

 

* Create dummy observations for cross tabulation;

Do rep=1 to round(1000*ProbNN);

Crash = 'N';

EMS = 'N';

keep Crash EMS;

output Sim;

end;

 

Do rep=1 to round(1000*ProbNY);

Crash = 'N';

EMS = 'Y';

keep Crash EMS;

output Sim;

end;

 

Do rep=1 to round(1000*ProbYN);

Crash = 'Y';

EMS = 'N';

keep Crash EMS;

output Sim;

end;

 

Do rep=1 to round(1000*ProbYY);

Crash = 'Y';

EMS = 'Y';

keep Crash EMS;

output Sim;

end;

datalines;

163.5 14.5 36.5 89.5 8.5 0.5 27.5 4.5 19.5

;

proc print data=EM;

title 'EM of Crash vs EMS Helmet Use for Inpatients';

run;

proc freq data=sim;

tables Crash*EMS;

run;

The SAS print procedure lists EM iterations so that convergence of the EM algorithm can be assessed.

The SAS freq procedure cross tabulates 1000 simulated observations of the MLE complete data relationship between police and EMS reported helmet use.

SAS is a registered trademark of SAS Institute, Inc.

 

 
© Copyright 2000 - 2008 Strategic Matching, Inc. All rights reserved. Microsoft, Windows, and Access are trademarks of Microsoft Corporation. Last modified: Monday January 28, 2008.