Section 8.8: What is the difference between the WEIGHT and FREQ statements? Assume a variable exists in a SAS dataset that represents a weight or frequency to be applied to the respective data for that record. When it is an integer and entered in a SAS procedure on a FREQ statement, it replicates the row of data as if it had been entered on "nn" separate rows. Assume one line of data exists with with y=48 and nn=3, id nn y 1 3 48 By entering the variable nn on a FREQ statement, SAS procedures treat the one record as if y=48 had been entered on 3 separate records: id y 1 48 1 48 1 48 If nn contains a non-zero decimal value, such as nn=3.6, SAS rounds it down to the nearest integer, which is 3 (i.e., it replicates the data row nn=3 times). When a weight variable is entered on a WEIGHT statement, only one record exists, but gives it a weight of equal to the value of the weight variable in any calculations. In this situation, non-integer weights are acceptable (as may be found in sample survey files). For means and sums, the difference in FREQ or WEIGHT with integer values isn't an issue, since the mean of means or the sum of sums will be the same; however, it is a big issue when computing variances as the following example shows. DATA one; INPUT wgt y; cards; 8.2 10.8750 6.1 10.3333 8 7.6250 7 1.5714 5 10.0000 4 7.0000 ; * unweighted results (each observation counts as 1 record with weight 1); PROC TABULATE NOseps; VAR y; TABLE y*N*f=4.0 y*MEAN*f=9.5 y*VAR*f=8.5; run; ----------------------- | y | y | y | |---+--------+--------| | N | Mean | Var | |---+--------+--------| | 6| 7.90078|12.02367| ----------------------- * the FREQ statement treats each record as if it exists the integer portion of the integer portion of wgt as separate records in the file and each given a weight of 1; PROC TABULATE Noseps; VAR y; TABLE y*n*f=4.0 y*mean*f=9.5 y*var*f=8.5; FREQ wgt; run; ----------------------- | y | y | y | |---+--------+--------| | N | Mean | Var | |---+--------+--------| | 38| 7.86841|11.14989| ----------------------- * the WEIGHT statement with nn treats each record as if it exists once in the file and is given a weight of NN in the calculations; PROC TABULATE NOseps; VAR y; Table y*n*f=4.0 y*mean*f=9.5 y*var*f=8.5; WEIGHT wgt; run; ------------------------- | y | y | y | |----+---------+--------| | N | Mean | Var | |----+---------+--------| | 6| 7.89055|82.98855| ------------------------- In the above table the weighted mean is computed under the assumption that only N=6 records in the file. To produce a count of records that reflects the weight variable, the procedure counts a variable called ijk where all values equal 1; proc tabulate noseps; VAR y ijk; Table ijk*sum*f=4.1 y*mean*f=9.5 y*var*f=8.5; WEIGHT wgt; run; ------------------------- |ijk | y | y | |----+---------+--------| |Sum | Mean | Var | |----+---------+--------| |38.3| 7.89055|82.98855| -------------------------