Chapter 10.1: An Introduction to the Basics of PROC IML Types of Statements The types of many statements are similar or identical to the DATA step: PROC IML; is always the first statement in the block of commands. It does not allow many options such as the DATA= to read a dataset (; The statements that follow the PROC line are of the following types: Commands: special operating, control input/output, options Functions/call: perform special tasks or user defined operations Control: direct the flow of execution (if/then, loops, goto, etc.) Before I introduce these statements, here are a few basic pieces of terminology that is necessary to know before writing them. Scalars Scalars (i.e., single numbers) are assigned just as they are in the DATA step; the statement is defined by a variable name with the numbers or characters to the right of the equals sign: x = 3; * the scalar x is assigned the number 3; p = . ; * a missing data value; Note: PROC IML only works with one missing data symbol, the period. x_chr='apple'; * x_chr is a character variable assigned 'apple'; Matrices PROC IML’s ability to compute and manipulate matrices is a powerful application for many computing problems. In this section a matrix will either be a one-dimensional vector (either row or column) with m elements or it will be two-dimensional matrix with r rows and c columns having m=rxc elements. It will contain either all character data (of length from 1-32676) or all numeric (double precision) data; an IML matrix cannot contain both types. Naming conventions for the matrices follow the same guidelines as valid SAS dataset variable names (See Chapter 1). Although the contents of matrices may be read from SAS datasets, in its most basic application, one must first assign a new matrix by entering data values between 'curly' brackets, not the 'rounded' parentheses [an error message will result with () ] or the colon operator. For example, the following two equivalent statements write a vector of 1 row with m=4 elements (=1 2 3 4) called x: xr = { 1 2 3 4 } ; xr = 1:4 ; IML will interpret this matrix (from either statement) as: xr = [1 2 3 4] To make a column vector of m=4 elements the same approach is applied with modifications of separating the elements by commas or applying the transpose operator to a row vector indicated with a ` (the lowercase symbol most likely located on the uppermost left corner of your keypad). xc = { 1, 2, 3, 4 } ; xc = (1:4)` ; Both commands produce a matrix which contains a column vector of four elements: 1 XC = 2 3 4 To enter a 3x2 matrix of numbers (3 rows and 2 columns), enter numbers for each column from left to right beginnin with row 1 with elements for subsequent rows separated by commas: X_mat= {1 2, 3 4, 5 6} You can also enter the data as they appear in matrix form: X_mat = {1 2, 3 4, 5 6} ; The matrix applied in equations looks like: 1 2 X_mat = 3 4 5 6 To assign numbers to the elements of a matrix from variable names, one approach is to first identify a matrix with its appropriate dimensions and then directly reference the row and column of the desired entries. For example, with a 1 x 3 row vector xxx: PROC IML; a=2; b=1; qn=5; xxx=1:3; xxx[1]=a; xxx[2]=b; xxx[3]=qn; PRINT xxx; QUIT; gives you the row vector of numbers: XXX = [ 2 1 5 ] If you try to enter the variable names as values inside the curly brackets: PROC IML; a=2; b=1; qn=5; xxx={a b qn}; print xxx; IML will give you a row vector of character values: XXX = [A B QN ] To repeat values more than once across a row, apply the [#] syntax, where # is the number of repetitions: x = {[3] 'abc', [3] 'jkl', [3] 'xyz'} ; The x matrix will be: _ _ x = | abc abc abc | | jkl jkl jkl | |_ xyz xyz xyz _| To enter numerical values into an r x c matrix, first assign the matrix with the J function (described below) to have 3 rows and 2 columns with default entries of periods for missing data: PROC IML; a=2; b=1; qn=5; xxx=J(3,2,.); xxx[1,1]=a; xxx[2,2]=b; xxx[3,2]=qn; PRINT xxx; QUIT; IML now works with the matrix: 2 . XXX = . 1 . 5 If ab is an 5x3 matrix, you can write single column vectors that have three elements in them: a = ab[,2]; b = ab[,3]; Matrix-generating Functions IML has built-in functions to generate matrices with specific structures. A few of them are briefly described: BLOCK: forms a block-diagonal matrix. BLOCK( matrix1<, matrix2,..., matrix15>) where matrix_i is a numeric matrix or literal. The BLOCK function forms a block-diagonal matrix from the matrices specified in the argument matrices. Up to 15 matrices can be specified in this syntax. The matrices are combined diagonally to form a new matrix. For example, the statement ABC = BLOCK(a,b,c); produces a block diagonal matrix of the form _ _ | A | ABC = | B | | C | - - The statements PROC IML; a={2 4, 4 3} ; b={4 6 7, 6 8 5, 7 5 3} ; ab=block(a,b); PRINT ab; QUIT; IML forms the matrix: AB 5 rows 5 cols (numeric) 2 4 0 0 0 4 3 0 0 0 0 0 4 6 7 0 0 6 8 5 0 0 7 5 3 I: forms the identity matrix. I(dimension) where dimension specifies the size of the identity matrix. The I(r) function forms an identity matrix with r rows and r columns. The diagonal elements of an identity matrix are 1s; all other elements are 0s. The value of dimension must be an integer greater than or equal to 1. Noninteger operands are truncated to their integer part. For example, the statement a=I(3); gives 1 0 0 a = 0 1 0 0 0 1 VECDIAG The VECDIAG function makes a column vector from the diagonal elements from a square numeric matrix: VECDIAG( square-matrix) The VECDIAG function makes a column vector with elements from the main diagonal of the square matrix. For example, the statements a={2 1, 0 -1}; c=vecdiag(a); produces the numeric column vector (2 rows and 1 column) _ _ C= | 2 | | -1 | - - J The J function forms a matrix of given dimensions and a specified numeric value when you supply the number of rows and columns and a default value for each element of the new matrix. This function initializes a rectangular matrix of a predetermined size. J( nrow<, ncol<, value>>) The inputs to the J function are as follows: nrow: is a numeric matrix or literal giving the number of rows. ncol: is a numeric matrix or literal giving the number of columns. value: is a numeric or character matrix or literal for filling the rows and columns of the matrix. If you want to make a column vector and don’t know the specific number of rows (such as entered with a macro variable) you can compute a column vector J function as you set elements to missing (or some specified value): %LET nrws=3; PROC IML; fn=J(&nrws,1,.); PRINT fn; Quit; FN . . . Many of the examples which appear throughout this section will illustrate how these matrix-generating functions work. Read Data into an IML matrix from a SAS dataset Rather than enter the data manually into matrices, the most efficient and practical way is to enter them from existing SAS datasets. To read data into a matrix, the SAS dataset must already exist either as a temporary or permanent file: Two statements work together to convert SAS datasets into matrices: USE ; The USE statement identifies a SAS dataset called test1. READ < where(expression)> ; The READ statement converts specified variables from a SAS dataset into a matrix (the variable names become the columns; the rows of the dataset are the rows of matrix). The READ statement reads values of variables from the specified SAS dataset listed on the USE statement into a matrix IML can manipulate. Before the READ statement can execute, the USE statement must appear prior to it to open the dataset. RANGE all : select all cases next<#>: read next # observations Point : read observations from rows specified by a scalar, literal, or matrix VAR Specify which variables to read All : all variables Num : (all numeric) Char : (all character) Specify a list of names: {variable-list} Form a matrix containing a range of variable names with VAR {'x1':'x9'}) WHERE This option works like it does in the DATA or PROC steps, with the exception you need to enter the actual mathematical symbol, < = or >, rather than LT, EQ, or GT. INTO < name > The name you enter specifies the the IML matrix in which to place the data. Here are a few examples. The following USE and READ statements place all the numeric observations (a b y) from the SAS dataset test1 into the columns of the matrix x: DATA test1; INPUT a b y c $ @@; CARDS; 1 2 76 M 2 5 56 F 3 9 87 M 4 8 78 M 5 7 46 F 6 2 83 F ; PROC IML; USE test1; READ all VAR {a b y} INTO x; PRINT x; quit; X 1 2 76 2 5 56 3 9 87 4 8 78 5 7 46 6 2 83 This next example reads the variable y from observations 2, 3, and 6 contained in the dataset test1 into the column vector x. PROC IML; USE test1; READ point {2 3 6} VAR {a b} into x; PRINT x; QUIT; gives: 2 5 X = 3 9 6 2 In the next example, read the first 6 records of the dataset test1 and select variables a and b from observations based on a conditional statement about c. Only read data into the matrix x for which the observation which have a value of c equal to ‘F’. PROC IML; USE test1; READ next 6 var {a b} WHERE(c = 'F') INTO x; PRINT x; QUIT; X 2 5 5 7 6 2 Print Matrices to the Output File As already shown, matrices can be printed to the output window with a PRINT statement. If the two matrices are separated by a space only, they are printed side-by-side: PRINT x y ; Enter a comma between the matrix names to print each matrix in a stacked manner, one above the other: PRINT x, y ; You can specify your own row and column headings with vectors that contain the headings and then display the matrix with the ROWNAME= and COLNAME= options. > names={jenny linda jim samuel}; > days={mon tue wed thu fri}; > print coffee[rowname=names colname=days]; COFFEE MON TUE WED THU FRI JENNY 4 2 2 3 2 LINDA 3 3 1 2 1 JIM 2 1 0 2 1 SAMUEL 5 4 4 3 4 Assignment Statements for Matrix Functions A function of one or more existing matrices computes a matrix with the same or new dimensions with the given transformation or matrix characteristic: y_org = y; * copies all elements of matrix y into the matrix y_org; lg_y = LOG(p); * computes a new matrix with logs as elements q_in = INV(Q); * compute the inverse of q and put it is Z rnk = RANK(t); * compute the rank of t mlt_ab = a * b; * multiply matrices A and B (they must be conformable) EIGVAL(A) computes eigenvalues of A, a square numeric matrix. The EIGVAL function returns a column vector of the eigenvalues of A. The following code computes Example 7.1.1 from Golub and Van Loan (1989): PROC IML; a = { 67.00 177.60 -63.20, -20.40 95.88 -87.16, 22.80 67.84 12.12}; eig_val = EIGVAL(a); print eig_val; quit; EIG_VAL 75 100 75 -100 25 0 Control Statements "IF .. THEN.. ELSE": statements work as they do in the DATA step DO statements Interative DO loops also function the same as they do in the DATA step. However, note that within the loop matrix elements need to be referenced by their index numbers, and not the matrix name alone. The following loop replaces individual values of the matrix y if the absolute value of the corresponding element of matrix called resid exceeds a cutoff point; * dataset rsd is output from proc REG; PROC IML; USE rsd; READ all VAR {res} INTO resid; k=.5; DO j = 1 to NROW(resid); IF (resid[j] > k) THEN y[j] = yp[j] + k ; IF (resid[j] < -k) THEN y[j] = yp[j] - k ; END; QUIT; GOTO/LINK These statements are available and work as they do in the DATA step; however their entry is discouraged. START, FINISH These two statements form the boundaries of modules you can write yourself which then act like subroutines or functions. General syntax START module-name <(arguments)> ; < insert module statements> ; FINISH; Example: In IML write a module called chng_y that takes specified values of a and b and computes the value of y given the global value of x. PROC IML; START chng_y(b,c) global(x,y); y=(x+b)/c; FINISH; DO x = 1 to 3; b=3; c=5; RUN chng_y(b,c); * The RUN statement executes the module; PRINT y; END; QUIT; Here b and c are constants while x varies from 1, 2, 3 y= (1+3)/5= 0.8 Y= (2+3)/5= 1.0 y =(3+3)/5= 1.2 Functions: Matrix inquiry: returns information about a matrix Scalar: work on individual elements (e.g. abs value) Summary: sample statistics (e.g., mean, sum, SS, etc. Matrix arithmetic: determinate, eigenvalues, Reshape: form block diagonal matrix from other matrix, transpose Linear algebra and statistical functions: regression, estimation Functions: yt = y` ; * yt = y transpose ; Beta = INV(x` * x) * (x` * y) * the least squares regression equation; Write a new SAS dataset from the contents of an IML matrix Similar to the requirement for two statements to read a dataset, PROC IML requires two statements to write an IML matrix to a new SAS dataset. The first statement is CREATE. The second statement is APPEND. CREATE opens up the dataset; APPEND actually inserts the matrix elements into the dataset. CREATE FROM <[colname=variable-names rowname=variable-name-for-row-titles]>; The APPEND statement places observations from the matrix into the SAS dataset: APPEND FROM ; PROC IML; X={2 3, 4 5}; CREATE test2 FROM x[colname={'a' 'b'}]; APPEND FROM x; Quit; PROC PRINT DATA=test2 Noobs; run; Results in the printed output: a b 2 3 4 5 If the [colname={‘a’ ‘b’}] is not included in the CREATE statement, the vars are named col1, col2, etc. You can also use the following notation to apply column names: v = 'a' : 'b'; CREATE test2 from Z [colname = v]; IML Applications Calculate Hotelling's T^2 comparing means of sample data to population means PROC IML; USE calcium; READ all VAR{test1 test2 test3} INTO x; n = NROW(x); mu = {15, 6, 2.85}; * hypothesized values for test1, test2, test3; xbar = 1/n*x` * J(n,1); s = 1/(n-1)*x` * (I(n) - 1/n * J(n) ) * x; t2 = n * (xbar-mu)` * INV(s) * (xbar-mu); PRINT n mu xbar s; PRINT t2 ; RUN; QUIT; Other applications of PROC IML are found on the page: http://www.uoregon.edu/~robinh/100_IML.html