Section 7c: Multivariate ANOVA I. Compute the Correlation of Two Observations from each Subject Section 7c: Multivariate ANOVA I. Compute the Correlation of Two Observations from each subject II. Multivariate Analysis with Two Correlated Data of Different Types I. Compute the Correlation of Two Observations from each Subject The correlation coefficient is a measure of association not expressed in any units. It allows you to measure the association between two different types of variables, such as height and weight, measured on each subject. An interesting capability of the REPEATED statement is to compute the familiar Pearson correlation of two different types of variables as you would with PROC CORR. The sashelp.class dataset will provide an example to compute the correlation of height and weight for each subject. Assuming independent measurements, PROC CORR is the usual way one would compute a correlation: PROC CORR DATA=sashelp.class cov; VAR height weight; RUN; Covariance Matrix, DF = 18 Height Weight Height 26.2869006 102.4934211 Weight 102.4934211 518.6520468 Pearson Correlation Coefficients, N = 19 Prob > |r| under H0: Rho=0 Height Weight Height 1.00000 0.87779 <.0001 Weight 0.87779 1.00000 <.0001 The correlation is computed with the covariance and variances produced by the cov option: 0.87779 = 102.49 / (SQRT(26.29)*SQRT(518.65)) To compute this same correlation with PROC MIXED, first extract the relevant data stored in multivariate format and transpose into univariate format: DATA cls; SET sashelp.class; LENGTH vr $3 ; KEEP name vr y; * the first name is the unique id value for each subject; vr='hgt'; y= height; OUTPUT; vr='wgt'; y= weight; OUTPUT; run; * run the null model, that is, compute separate variances for both height and weight and no correlation, type=un(1); ODS SELECT r fitstatistics; PROC MIXED DATA=cls NOitPrint NOclPrint method=reml; CLASS vr name; MODEL y = vr / NOint; REPEATED vr / subject=name TYPE=un(1) r; RUN; Estimated R Matrix Row Col1 Col2 1 26.2869 2 518.65 Fit Statistics -2 Res Log Likelihood 279.4 The most complex covariance structure possible with these data is the unstructured which allows a covariance term; that is compute separate variances and a covariance term with the covariance structured specified with type=un: * enter vr as a fixed effect with NOint and the name of the repeated factor ; ODS SELECT rcorr r fitstatistics; PROC MIXED DATA=cls NOitPrint NOclPrint method=reml; CLASS vr name; MODEL y = vr / DDFM=bw NOint; REPEATED vr / subject=name TYPE=un r rcorr; RUN; The values found in the R matrix (variance/covariances of the residuals about the means of height and weight) match the covariance matrix produced with PROC CORR and its cov option. The rcorr matrix (correlation of height and weight) matches the correlation computed by PROC CORR. Estimated R Matrix Row Col1 Col2 1 26.2869 102.49 2 102.49 518.65 Estimated R Correlation Row Col1 Col2 1 1.0000 0.8778 2 0.8778 1.0000 The two variable names are placed in alphabetical order, so Row 1, Col1 refers to height and Row 2, Col2 refers to weight. Since two different types of data are correlated and have different variances, select the unstructured option (type=un) so that it will compute separate variances for each type of response. To determine if the correlation is significant, one can utilize the Likelihood ratio test: Fit Statistics -2 Res Log Likelihood 252.9 The test of whether an unstructured covariance matrix fits better than the independence model (with no covariance) is found with the likelihood ratio test. For the null model -2*LL=279.4 and for the full model -2*LL=252.9. The difference, 279.4-252.9 = 26.5, which is highly significant for 3-2=1 degree of freedom, (that is, 2 variances and 1 covariance for the full model and only 2 variances for the null model were estimated). The actual pvalue will be different than the pvalue computed from PROC CORR since it is based on a different type of statistical test. When interpreting this test in general, it only says a correlation matrix is more appropriate than the independence model; it does not state that it is the MOST appropriate choice.