Section 8.3: Percentages of Counts PROC TABULATE computes percents of count data by levels of categorical variables. That is, across all levels of a factor or levels within a grouping variable, if you have 10 females and 5 males, then of the total number of respondents (total=15) 66.7% are female and 33.3% are males. KEYWORDS: PCTN COLPCTN ROWPCTN PAGEPCTN REPPCTN Denominator definitions to calculate percentages with PROC TABULATE can be very confusing. Row or column percentages are most commonly encountered. To have meaning in a conditional probability sense, the percentages in each row or column will add to 100%. PctN option for column and row percents The first block of TABULATE statements demonstrates how to calculate percents for the number of observations in each cell. To calculate percentages that add to 100% across all columns within each row, specify the column variable as the denominator definition. PROC TABULATE DATA=indat NOseps; CLASS row col; TABLE row, (col all)*pctn*f=6.2 / rts=11 ; RUN; To calculate percentages that add to 100% across all rows within each column specify the row variable as the denominator: PROC TABULATE DATA=indat NOseps; CLASS row col; TABLE (row all), col*pctn*f=6.2 / rts=11 ; RUN; Example The data give below demonstrates how to compute counts and percents of count data across rows and columns. DATA hs; LABEL row='Row' col='Column'; INPUT row column @@; CARDS; 1 2 1 1 1 2 1 1 1 2 2 2 1 1 1 1 1 2 2 1 1 1 1 2 1 2 1 2 1 1 1 2 1 2 1 1 2 2 2 1 2 2 1 2 2 3 1 1 1 3 2 2 2 1 1 1 1 3 1 3 2 2 1 4 2 4 1 4 1 4 1 2 1 3 2 1 2 3 1 2 2 3 2 4 2 4 2 4 1 3 ; PROC TABULATE DATA=hs NOseps ; CLASS row col ; TABLE (row all='Total'), (col all='Total')*n=' '*f=5.0 / rts=10 box='Total Counts'; TABLE row, (col all)*pctN*f=7.2 / rts=14 box='Percents across each row for the column vars'; TABLE (row all), (col all)*(n*f=3.0 pctN*f=7.2) / rts=14 box='Counts and percents across the row'; TABLE row, (col all)*rowpctN*f=7.2 / rts=18 box='Prob (column A | row B)'; RUN; Output from the first TABLE statement gives the actual counts and ttheir row and column totals: ---------------------------------------- |Total | Column | | |Counts |-----------------------| | | | 1 | 2 | 3 | 4 |Total| |--------+-----+-----+-----+-----+-----| |Row | | | | | | |1 | 9| 12| 5| 3| 29| |2 | 4| 5| 3| 4| 16| |Total | 13| 17| 8| 7| 45| ---------------------------------------- The columns percents can be interpreted as conditional probabilities, given the row. prob(COLUMN | ROW) = prob(Row and Column)/Prob(column) Output from the second TABLE statement shows how the percents sum to 100 across each row: ------------------------------------------------------ |Percents | column | | |across each |-------------------------------| | |row for the | 1 | 2 | 3 | 4 | All | |column vars |-------+-------+-------+-------+-------| | | PctN | PctN | PctN | PctN | PctN | |------------+-------+-------+-------+-------+-------| |row | | | | | | |1 | 31.03| 41.38| 17.24| 10.34| 100.00| |2 | 25.00| 31.25| 18.75| 25.00| 100.00| ------------------------------------------------------ The third TABLE statement shows how to place the counts and the percents side-by-side with the pctN option; it also produces row and column totals: -------------------------------------------------------------------------- |Counts and | column | | |percents |-----------------------------------------------| | |across the | 1 | 2 | 3 | 4 | All | |row |-----------+-----------+-----------+-----------+-----------| | | N | PctN | N | PctN | N | PctN | N | PctN | N | PctN | |------------+---+-------+---+-------+---+-------+---+-------+---+-------| |row | | | | | | | | | | | |1 | 9| 31.03| 12| 41.38| 5| 17.24| 3| 10.34| 29| 100.00| |2 | 4| 25.00| 5| 31.25| 3| 18.75| 4| 25.00| 16| 100.00| |All | 13| 28.89| 17| 37.78| 8| 17.78| 7| 15.56| 45| 100.00| -------------------------------------------------------------------------- The fourth Table statement introductes the RowPctN option which works very much like PctN with fewer entries: ---------------------------------------------------------- |Prob (A | row B)| Column | | | |-------------------------------| | | | 1 | 2 | 3 | 4 | All | | |-------+-------+-------+-------+-------| | |RowPctN|RowPctN|RowPctN|RowPctN|RowPctN| |----------------+-------+-------+-------+-------+-------| |Row | | | | | | |1 | 31.03| 41.38| 17.24| 10.34| 100.00| |2 | 25.00| 31.25| 18.75| 25.00| 100.00| ---------------------------------------------------------- Even though pctN and rowpctN produce the same numerical results, notice the difference in the presence of "col" (which is the variable name for the column) and "row" (which needs no variable name specification) in the two statements. Calculate Percents Down a Column: The pctN and ColpctN options demonstrated below work in an analogous manner to pctN and RowpctN as it computes percents which add to 100% down each column: PROC TABULATE DATA=hs noseps ; CLASS row column ; TABLE (row all), column*pctn*f=7.2 / rts=14 box='Conditional percents for row given col'; TABLE (row all), column*ColpctN*f=7.2 / rts=14 box='Conditional percents for row given col'; RUN; --------------------------------------------- |Conditional | column | |percents |-------------------------------| |for row | 1 | 2 | 3 | 4 | |given col |-------+-------+-------+-------| | | PctN | PctN | PctN | PctN | |------------+-------+-------+-------+-------| |row | | | | | |1 | 69.23| 70.59| 62.50| 42.86| |2 | 30.77| 29.41| 37.50| 57.14| |All | 100.00| 100.00| 100.00| 100.00| ---------------------------------------------- DATA one; DO car=1 to 3; do age=1 to 2; rept = ceil(8*ranuni(92838)); do i=1 to rept; output; end; end; end; PROC TABULATE DATA=one NOseps ; CLASS car age ; TABLE (car * ( age all ) all), n*f=4.0 colpctn='(%)'*f=6.1; RUN; ------------------------------- | | N | (%) | |-----------------+----+------| |car age | | | |1 1 | 5| 17.9| | 2 | 1| 3.6| | All | 6| 21.4| |2 age | | | | 1 | 8| 28.6| | 2 | 1| 3.6| | All | 9| 32.1| |3 age | | | | 1 | 6| 21.4| | 2 | 7| 25.0| | All | 13| 46.4| |All | 28| 100.0| ------------------------------- The same table presented previously with the class dataset can be structured down the columns with ColPctN: PROC TABULATE DATA=tmp NOseps; CLASS ag sex ht; TABLE (sex ht all='Total'), (ag all='Total')*(n*f=3.0 colpctN*f=7.2) / rts=9; run; --------------------------------------------- | | ag | | | |-----------------------| | | | 0 | 1 | Total | | |-----------+-----------+-----------| | | N |ColPctN| N |ColPctN| N |ColPctN| |-------+---+-------+---+-------+---+-------| |Sex | | | | | | | |F | 3| 42.86| 6| 50.00| 9| 47.37| |M | 4| 57.14| 6| 50.00| 10| 52.63| |ht | | | | | | | |0 | 6| 85.71| 4| 33.33| 10| 52.63| |1 | 1| 14.29| 8| 66.67| 9| 47.37| |Total | 7| 100.00| 12| 100.00| 19| 100.00| --------------------------------------------- Multiple Response variables It is possible to enter two or more variables to be placed across the columns within parentheses: DATA TMP; SET sashelp.class ; LABEL ag='Age'; ht = (HEIGHT > 63); ag = (age ge 13); RUN; PROC TABULATE DATA=tmp NOseps; CLASS ag sex ht; TABLE (ag all='Total'), (sex ht all='Total')*(n*f=3.0 rowpctN*f=7.2) / rts=9; run; --------------------------------------------------------------------- | | Sex | HT | | | |-----------------------+-----------------------| | | | F | M | 0 | 1 | Total | | |-----------+-----------+-----------+-----------+-----------| | | N |RowPctN| N |RowPctN| N |RowPctN| N |RowPctN| N |RowPctN| |-------+---+-------+---+-------+---+-------+---+-------+---+-------| |Age | | | | | | | | | | | |0 | 3| 42.86| 4| 57.14| 6| 85.71| 1| 14.29| 7| 100.00| |1 | 6| 50.00| 6| 50.00| 4| 33.33| 8| 66.67| 12| 100.00| |Total | 9| 47.37| 10| 52.63| 10| 52.63| 9| 47.37| 19| 100.00| --------------------------------------------------------------------- This method of summarizing data is useful when the two column variables both have a small number of levels; otherwise, the table will can get too wide to fit on the output without a 'Continued' message. The next section demonstrates how to avoid this situation. Another way to compute summary tables for each variable is to place the data in vertical or univariate form (See Section 6.5 on Transposing options) and then add the new variable indicating the variable name to the CLASS statement as follows (e.g., compute total and percent of total for 4 responses from a survey): DATA srv; INPUT id q1 $ q2 $ q3 $ q4 $; DATALINES; 1 A B C D 2 E F A E 3 C B B A 4 B A D E 5 E F A B 6 A A A C 7 F E A E ; DATA unv; SET srv; qst='Question 1'; ans=q1; output; qst='Question 2'; ans=q2; output; qst='Question 3'; ans=q3; output; qst='Question 4'; ans=q4; output; drop q1-q4; run; * example is provided for pctN and rowpctN options ; PROC TABULATE DATA=new NOSEPS; CLASS qst ans; TABLE (qst='Question' all), ans='Choices'*(n='N'*f=2.0 pctn='%'*f=5.1) / rts=12 misstext=' '; TABLE (qst=' ' all='Total'), (ans='Choices' all)*(n='N'*f=2.0 rowpctN='%'*f=5.1) / rts=12 misstext=' '; run; The second TABLE statement prints this table: --------------------------------------------------------------------------- | | Choices | | | |-----------------------------------------------------| | | | A | B | C | D | E | F | All | | |--------+--------+--------+--------+--------+--------+--------| | |N | % |N | % |N | % |N | % |N | % |N | % |N | % | |----------+--+-----+--+-----+--+-----+--+-----+--+-----+--+-----+--+-----| |Question 1| 2| 28.6| 1| 14.3| 1| 14.3| | | 2| 28.6| 1| 14.3| 7|100.0| |Question 2| 2| 28.6| 2| 28.6| | | | | 1| 14.3| 2| 28.6| 7|100.0| |Question 3| 4| 57.1| 1| 14.3| 1| 14.3| 1| 14.3| | | | | 7|100.0| |Question 4| 1| 14.3| 1| 14.3| 1| 14.3| 1| 14.3| 3| 42.9| | | 7|100.0| |Total | 9| 32.1| 5| 17.9| 3| 10.7| 2| 7.1| 6| 21.4| 3| 10.7|28|100.0| ---------------------------------------------------------------------------