Derivation of various statistical formulas in matrix form, leading up to R

by John Kelley


Begin with a data matrix X.  The variables are Extroversion and
Introversion, and there are five subjects.

Variables ---------------- | Extro Intro | | ----- ----- | 1 | 5 1 | 2 | 1 3 | Subjects 3 | 2 5 | 4 | 4 0 | 5 | 3 1 | ---------------- To compute the Sums of Squares/Cross Products matrix (SSQ/CP),pre-multiply X by its transpose X':

X' * X = X'X 5 1 2 4 3 5 1 55 21 1 3 5 0 1 1 3 21 36 2 5 4 0 SSQext CP 3 1 CP SSQint 2x5 5x2 2x2

To sum variables (columns) across subjects (rows), pre-multiply X by the transpose of the unit vector (1'):

1' * X = 1'X 1 1 1 1 1 5 1 15 10 1 3 2 5 4 0 3 1 1x5 5x2 1x2 To find the means of the variables, multiply the vector representing the sums of the variables by the scalar n-1 to yield a vector containing the variables' means:

n-1 * 1'X = n-11'X 1/5 15 10 3 2

You can expand the vector containing the means into a matrix of the same dimensions as the original data matrix by pre-multiplying by the unit vector:

1 * (n-11'X) = 1(n-11'X) 1 3 2 3 2 1 3 2 1 3 2 1 3 2 1 3 2 5x1 1x2 5x2 _ Thus, 1(n-11'X) is the matrix formula for deriving X, a matrix that is the same size as the original data matrix and that contains the mean for each variable in every cell of the appropriate column. In addition, one can rearrange the formula as follows: _ X = 1(n-11'X) _ X = n-1[(1)(1')]X = n-1(E)X,

where E is a square matrix of dimensions nxn and contains all ones. The example below illustrates this formula:

n-1 * E * X = n-1(E)X = X (1/5) 1 1 1 1 1 5 1 3 2 1 1 1 1 1 1 3 3 2 1 1 1 1 1 2 5 3 2 1 1 1 1 1 4 0 3 2 1 1 1 1 1 3 1 3 2 5x5 5x2 5x2

A matrix of mean deviations (Y) can be derived next: _ X - X = Y X - n-1(E)X = Y

5 1 3 2 2 -1 1 3 3 2 -2 1 2 5 - 3 2 = -1 3 4 0 3 2 1 -2 3 1 3 2 0 -1

Now we can generate a matrix of the Sums of Squares and CrossProducts of the Mean Deviations (note the similarity of this operation to the very first operation we performed):

Y' * Y = Y'Y 2 -2 -1 1 0 2 -1 10 -9 -1 1 3 -2 -1 -2 1 -9 16 -1 3 1 -2 SSQext CP 0 -1 CP SSQint 2x5 5x2 2x2

One can then simply multiply by the scalar (n-1)-1to produce the Variance/Covariance Matrix (S):

(n-1)-1 * Y'Y = V = S (some denote it with V, some with S)

= (1/4) * 10 -9 = 2.50 -2.25 = s2ext Cov -9 16 -2.25 4.00 Cov s2int

To produce the Correlation Matrix, R, from the Variance/Covariance Matrix, S, we need to backtrack a bit. First, we need to produce a data matrix of standardized scores (Z). To do so, we simply take the mean deviation data matrix (Y) that we computed earlier and post-multiply it by a diagonal matrix consisting of the reciprocals of the standard deviations of the appropriate variables(Ds-1).

Note: s-1ext =[sqrt(s2ext)]-1 = [sqrt(2.50)]-1= .63, s-1int =[sqrt(s2int)]-1 = [sqrt(4.00)]-1= .50.

Y * Ds-1 = Z

2 -1 0.63 0.00 1.26 -0.50 -2 1 0.00 0.50 -1.26 0.50 -1 3 -0.63 1.50 1 -2 0.63 -1.00 0 -1 0.00 -.50 5x2 2x2 5x2

Now for some algebraic manipulation. The formula for producing a Correlation Matrix is R = Z'Z(n-1)-1. Therefore:

R = Z'Z(n-1)-1 = Z' * Z * (n-1)-1 then, by substitution: = (YDs-1)' * (YDs-1) * (n-1)-1 then, b/c the transpose of a product is the product of the transposes in the reverse order: = (Ds-1)'Y'* (YDs-1) * (n-1)-1 then, we can group the terms b/c matrix multiplication is associative: = (Ds-1)'[Y'Y (n-1)-1] * Ds-1 Now, we substitute: = (Ds-1)'*S*(Ds-1) Finally, we substitue (Ds-1) for (Ds-1)' b/c transposing a diagonal matrix doesn't change it: R = (Ds-1)*S*(Ds-1)

Therefore, the correlation matrix (R) can be generated from the Variance/Covariance Matrix (S) by simply pre- and post-multiplying by a diagonal matrix composed of the reciprocals of the standard deviations of the appropriate variables as illustrated below for our data set:

(Ds-1) * S *(Ds-1) = R

0.63 0.00 2.50 -2.25 0.63 0.00 0.00 0.50 -2.25 4.00 0.00 0.50 = 1.575 -1.4175 * 0.63 0.00 = 1.00 -.71 -1.125 2.0000 0.00 0.50 -.71 1.00