Section 6: PROC TABULATE and Missing Data 1. Print missing levels of factors 2. Missing values and formats 3. Produce tables with empty rows and/or columns 1. Missing levels The option printmiss entered on the TABLE statement prints all row and column labels for each portion of the table, even if these combinations do not exist for certain levels. The misstext= option tells SAS what character to print in any cell which is empty. Typically a blank (as shown here) for numerical summaries is entered. A '0' may be inserted if it is reasonable for cell counts equal to 0. DATA tst; lvl=1; a=.r; b=1; output; a=1; b=2; output; a=2; b=1; output; b=2; output; lvl=2; a=1; b=.t; output; b=2; output; a=2; b=1; output; b=2; output; PROC TABULATE DATA=tst NOseps missing; CLASS lvl a b; TABLE lvl*a, b*n=' '*f=4.0 / rts=14 printmiss misstext=' '; OPTIONS LS=75; run; ----------------------------- | | b | | |--------------| | | T | 1 | 2 | |------------+----+----+----| |lvl a | | | | |1 R | | 1| | | 1 | | | 1| | 2 | | 1| 1| |2 R | | | | | 1 | 1| | 1| | 2 | | 1| 1| ----------------------------- 2. Missing values and formats Consider this sample input dataset and its associated format: PROC FORMAT; VALUE incm 1='less than 15000' 2='15000 to 30000' 3='30000 to 45000' 4='greater than 45000' .n="no response" .r="refused for privacy" .u="unemployed" .="unknown"; RUN; DATA two; LABEL income='Income'; FORMAT income incm. ; MISSING n r u; * the letters n r u refer to specific reasons for missing data from income as entered in PROC FORMAT; INPUT id income @@; CARDS; 1 1 2 . 3 r 4 4 5 2 6 n 7 u 8 1 9 2 10 r 11 1 12 3 13 2 14 3 15 n 16 . ; The option missing on the PROC statement summarizes how many missing values are present in the data. If you have coded with a variety of the 28 possible missing data codes as observed here, you will find summaries for each type. PROC TABULATE DATA=two NOseps missing; CLASS income; TABLE income, n='Count'*f=7.0 / RTS=20; RUN; ---------------------------- | | COUNT | -------------------+-------| |Income | | |unknown | 2 | |no response | 2 | |refused to answer | 2 | |unemployed | 1 | |lt 15000 | 3 | |15000 to 30000 | 3 | |30000 to 45000 | 2 | |gt 45000 | 1 | ---------------------------- proc format; value ysn 0='No' 1='Yes'; run; PROC TABULATE DATA=tbl NOseps missing; CLASS group time1 time2; TABLE time1, group*time2*n=' '*f=4.0 / rts=10 printmiss misstext='0'; FORMAT time1 time2 ysn. ; run; In the following table no responses for time2=yes were present in the dataset for group 3, so it prints a column of 0's. ---------------------------------------- | | group | | |-----------------------------| | | 1 | 2 | 3 | | |---------+---------+---------| | | time2 | time2 | time2 | | |---------+---------+---------| | | No |Yes | No |Yes | No |Yes | |--------+----+----+----+----+----+----| |time1 | | | | | | | |No | 1| 0| 1| 0| 4| 0| |Yes | 7| 2| 4| 3| 3| 0| ---------------------------------------- 3. Produce tables with empty rows and/or columns Suppose the dataset contains only a few of the cells of the table represented, yet you want to see a table which contains all possible combinations of rows and columns; * Assume a 4 x 4 table; * generate a dataset with 16 records which include all possible combinations of row and col levels; DATA one; DO row =1 to 4; DO col =1 to 4; OUTPUT; * outputs 16 combinations of row and col; END; END; RUN; * here is the actual data collected at this time; DATA cnts; INPUT row col count; Cards; 1 2 3 3 2 2 3 1 1 1 4 4 ; PROC TABULATE DATA=cnts Noseps CLASSDATA=one ; CLASS row col; FREQ count; TABLE (row all='Total'), (col all='Total')*n=' '*f=5.0 / rts=9 misstext=' '; FREQ count; RUN; --------------------------------------- | | col | | | |-----------------------| | | | 1 | 2 | 3 | 4 |Total| |-------+-----+-----+-----+-----+-----| |row | | | | | | |1 | | 3| | 4| 7| |2 | | | | | | |3 | 1| 2| | | 3| |4 | | | | | | |Total | 1| 5| | 4| 10| ---------------------------------------