Section 8.3: Percentages of Counts
PROC TABULATE computes percents of count data by levels of categorical variables. That
is, across all levels of a factor or levels within a grouping variable, if you have 10
females and 5 males, then of the total number of respondents (total=15) 66.7% are
female and 33.3% are males.
KEYWORDS: PCTN COLPCTN ROWPCTN PAGEPCTN REPPCTN
Denominator definitions to calculate percentages with PROC TABULATE can be very
confusing. Row or column percentages are most commonly encountered. To have meaning
in a conditional probability sense, the percentages in each row or column will add to
100%.
PctN option for column and row percents
The first block of TABULATE statements demonstrates how to calculate percents for the
number of observations in each cell.
To calculate percentages that add to 100% across all columns within each row, specify
the column variable as the denominator definition.
PROC TABULATE DATA=indat NOseps;
CLASS row col;
TABLE row, (col all)*pctn
*f=6.2 / rts=11 ;
RUN;
To calculate percentages that add to 100% across all rows within each column specify
the row variable as the denominator:
PROC TABULATE DATA=indat NOseps;
CLASS row col;
TABLE (row all), col*pctn*f=6.2 / rts=11 ;
RUN;
Example
The data give below demonstrates how to compute counts and percents of count data
across rows and columns.
DATA hs; LABEL row='Row' col='Column';
INPUT row column @@;
CARDS;
1 2 1 1 1 2 1 1 1 2 2 2 1 1 1 1 1 2
2 1 1 1 1 2 1 2 1 2 1 1 1 2 1 2 1 1
2 2 2 1 2 2 1 2 2 3 1 1 1 3 2 2 2 1
1 1 1 3 1 3 2 2 1 4 2 4 1 4 1 4 1 2
1 3 2 1 2 3 1 2 2 3 2 4 2 4 2 4 1 3
;
PROC TABULATE DATA=hs NOseps ;
CLASS row col ;
TABLE (row all='Total'), (col all='Total')*n=' '*f=5.0
/ rts=10 box='Total Counts';
TABLE row, (col all)*pctN*f=7.2
/ rts=14 box='Percents across each row for the column vars';
TABLE (row all), (col all)*(n*f=3.0 pctN*f=7.2)
/ rts=14 box='Counts and percents across the row';
TABLE row, (col all)*rowpctN*f=7.2 / rts=18 box='Prob (column A | row B)';
RUN;
Output from the first TABLE statement gives the actual counts and ttheir row and
column totals:
----------------------------------------
|Total | Column | |
|Counts |-----------------------| |
| | 1 | 2 | 3 | 4 |Total|
|--------+-----+-----+-----+-----+-----|
|Row | | | | | |
|1 | 9| 12| 5| 3| 29|
|2 | 4| 5| 3| 4| 16|
|Total | 13| 17| 8| 7| 45|
----------------------------------------
The columns percents can be interpreted as conditional probabilities, given the row.
prob(COLUMN | ROW) = prob(Row and Column)/Prob(column)
Output from the second TABLE statement shows how the percents sum to 100 across each
row:
------------------------------------------------------
|Percents | column | |
|across each |-------------------------------| |
|row for the | 1 | 2 | 3 | 4 | All |
|column vars |-------+-------+-------+-------+-------|
| | PctN | PctN | PctN | PctN | PctN |
|------------+-------+-------+-------+-------+-------|
|row | | | | | |
|1 | 31.03| 41.38| 17.24| 10.34| 100.00|
|2 | 25.00| 31.25| 18.75| 25.00| 100.00|
------------------------------------------------------
The third TABLE statement shows how to place the counts and the percents side-by-side
with the pctN option; it also produces row and column totals:
--------------------------------------------------------------------------
|Counts and | column | |
|percents |-----------------------------------------------| |
|across the | 1 | 2 | 3 | 4 | All |
|row |-----------+-----------+-----------+-----------+-----------|
| | N | PctN | N | PctN | N | PctN | N | PctN | N | PctN |
|------------+---+-------+---+-------+---+-------+---+-------+---+-------|
|row | | | | | | | | | | |
|1 | 9| 31.03| 12| 41.38| 5| 17.24| 3| 10.34| 29| 100.00|
|2 | 4| 25.00| 5| 31.25| 3| 18.75| 4| 25.00| 16| 100.00|
|All | 13| 28.89| 17| 37.78| 8| 17.78| 7| 15.56| 45| 100.00|
--------------------------------------------------------------------------
The fourth Table statement introductes the RowPctN option which works very much like
PctN with fewer entries:
----------------------------------------------------------
|Prob (A | row B)| Column | |
| |-------------------------------| |
| | 1 | 2 | 3 | 4 | All |
| |-------+-------+-------+-------+-------|
| |RowPctN|RowPctN|RowPctN|RowPctN|RowPctN|
|----------------+-------+-------+-------+-------+-------|
|Row | | | | | |
|1 | 31.03| 41.38| 17.24| 10.34| 100.00|
|2 | 25.00| 31.25| 18.75| 25.00| 100.00|
----------------------------------------------------------
Even though pctN and rowpctN produce the same numerical results, notice the
difference in the presence of "col" (which is the variable name for the column) and
"row" (which needs no variable name specification) in the two statements.
Calculate Percents Down a Column:
The pctN and ColpctN options demonstrated below work in an analogous manner to
pctN and RowpctN as it computes percents which add to 100% down each column:
PROC TABULATE DATA=hs noseps ;
CLASS row column ;
TABLE (row all), column*pctn*f=7.2
/ rts=14 box='Conditional percents for row given col';
TABLE (row all), column*ColpctN*f=7.2
/ rts=14 box='Conditional percents for row given col';
RUN;
---------------------------------------------
|Conditional | column |
|percents |-------------------------------|
|for row | 1 | 2 | 3 | 4 |
|given col |-------+-------+-------+-------|
| | PctN | PctN | PctN | PctN |
|------------+-------+-------+-------+-------|
|row | | | | |
|1 | 69.23| 70.59| 62.50| 42.86|
|2 | 30.77| 29.41| 37.50| 57.14|
|All | 100.00| 100.00| 100.00| 100.00|
----------------------------------------------
DATA one;
DO car=1 to 3;
do age=1 to 2;
rept = ceil(8*ranuni(92838));
do i=1 to rept; output; end;
end;
end;
PROC TABULATE DATA=one NOseps ;
CLASS car age ;
TABLE (car * ( age all ) all), n*f=4.0 colpctn='(%)'*f=6.1;
RUN;
-------------------------------
| | N | (%) |
|-----------------+----+------|
|car age | | |
|1 1 | 5| 17.9|
| 2 | 1| 3.6|
| All | 6| 21.4|
|2 age | | |
| 1 | 8| 28.6|
| 2 | 1| 3.6|
| All | 9| 32.1|
|3 age | | |
| 1 | 6| 21.4|
| 2 | 7| 25.0|
| All | 13| 46.4|
|All | 28| 100.0|
-------------------------------
The same table presented previously with the class dataset can be structured down the
columns with ColPctN:
PROC TABULATE DATA=tmp NOseps;
CLASS ag sex ht;
TABLE (sex ht all='Total'), (ag all='Total')*(n*f=3.0 colpctN*f=7.2) / rts=9;
run;
---------------------------------------------
| | ag | |
| |-----------------------| |
| | 0 | 1 | Total |
| |-----------+-----------+-----------|
| | N |ColPctN| N |ColPctN| N |ColPctN|
|-------+---+-------+---+-------+---+-------|
|Sex | | | | | | |
|F | 3| 42.86| 6| 50.00| 9| 47.37|
|M | 4| 57.14| 6| 50.00| 10| 52.63|
|ht | | | | | | |
|0 | 6| 85.71| 4| 33.33| 10| 52.63|
|1 | 1| 14.29| 8| 66.67| 9| 47.37|
|Total | 7| 100.00| 12| 100.00| 19| 100.00|
---------------------------------------------
Multiple Response variables
It is possible to enter two or more variables to be placed across the columns within
parentheses:
DATA TMP; SET sashelp.class ; LABEL ag='Age';
ht = (HEIGHT > 63);
ag = (age ge 13);
RUN;
PROC TABULATE DATA=tmp NOseps;
CLASS ag sex ht;
TABLE (ag all='Total'), (sex ht all='Total')*(n*f=3.0 rowpctN*f=7.2) / rts=9;
run;
---------------------------------------------------------------------
| | Sex | HT | |
| |-----------------------+-----------------------| |
| | F | M | 0 | 1 | Total |
| |-----------+-----------+-----------+-----------+-----------|
| | N |RowPctN| N |RowPctN| N |RowPctN| N |RowPctN| N |RowPctN|
|-------+---+-------+---+-------+---+-------+---+-------+---+-------|
|Age | | | | | | | | | | |
|0 | 3| 42.86| 4| 57.14| 6| 85.71| 1| 14.29| 7| 100.00|
|1 | 6| 50.00| 6| 50.00| 4| 33.33| 8| 66.67| 12| 100.00|
|Total | 9| 47.37| 10| 52.63| 10| 52.63| 9| 47.37| 19| 100.00|
---------------------------------------------------------------------
This method of summarizing data is useful when the two column variables both have a
small number of levels; otherwise, the table will can get too wide to fit on the
output without a 'Continued' message. The next section demonstrates how to avoid this
situation.
Another way to compute summary tables for each variable is to place the data in
vertical or univariate form (See Section 6.5 on Transposing options) and then add the
new variable indicating the variable name to the CLASS statement as follows (e.g.,
compute total and percent of total for 4 responses from a survey):
DATA srv;
INPUT id q1 $ q2 $ q3 $ q4 $;
DATALINES;
1 A B C D
2 E F A E
3 C B B A
4 B A D E
5 E F A B
6 A A A C
7 F E A E
;
DATA unv;
SET srv;
qst='Question 1'; ans=q1; output;
qst='Question 2'; ans=q2; output;
qst='Question 3'; ans=q3; output;
qst='Question 4'; ans=q4; output;
drop q1-q4;
run;
* example is provided for pctN and rowpctN options ;
PROC TABULATE DATA=new NOSEPS;
CLASS qst ans;
TABLE (qst='Question' all), ans='Choices'*(n='N'*f=2.0 pctn='%'*f=5.1) /
rts=12 misstext=' ';
TABLE (qst=' ' all='Total'), (ans='Choices' all)*(n='N'*f=2.0 rowpctN='%'*f=5.1) /
rts=12 misstext=' ';
run;
The second TABLE statement prints this table:
---------------------------------------------------------------------------
| | Choices | |
| |-----------------------------------------------------| |
| | A | B | C | D | E | F | All |
| |--------+--------+--------+--------+--------+--------+--------|
| |N | % |N | % |N | % |N | % |N | % |N | % |N | % |
|----------+--+-----+--+-----+--+-----+--+-----+--+-----+--+-----+--+-----|
|Question 1| 2| 28.6| 1| 14.3| 1| 14.3| | | 2| 28.6| 1| 14.3| 7|100.0|
|Question 2| 2| 28.6| 2| 28.6| | | | | 1| 14.3| 2| 28.6| 7|100.0|
|Question 3| 4| 57.1| 1| 14.3| 1| 14.3| 1| 14.3| | | | | 7|100.0|
|Question 4| 1| 14.3| 1| 14.3| 1| 14.3| 1| 14.3| 3| 42.9| | | 7|100.0|
|Total | 9| 32.1| 5| 17.9| 3| 10.7| 2| 7.1| 6| 21.4| 3| 10.7|28|100.0|
---------------------------------------------------------------------------