Section 2.4 : How the DATA Step Works Two conventional ways to structure a DATA step depend if you are reading an external file or to modify existing SAS data set(s). The general form for each case is described below: * Write a new SAS data set by reading an external text file; DATA <>; < enter declarative statements > INFILE <> ; INPUT <>; < enter executable statements > OUTPUT < >; RUN; * Modify existing SAS data set(s); DATA <>; SET <>; or MERGE <> ; or UPDATE <>; < enter declarative statements > < enter executable statements > OUTPUT <>; RUN; When a SAS program is submitted, all statements in each block are checked for accuracy (e.g., spelling, stray characters, presence/absence of semi-colons, etc.). Statements in the DATA step are defined to be of two types: declarative (i.e., they supply information to SAS and can usually appear anywhere in the DATA step) and executable (i.e., they result in some action during the iterations and the order of their placement in the step matters). Declarative statements such as ARRAY, ATTRIB, DROP, FORMAT, INFORMAT, KEEP, LABEL, LENGTH, RETAIN, RENAME, and so forth, supply information to SAS to define the working environment. With a few exceptions, they can be placed anywhere in the current block of DATA step statements; however, it usually aids readability to put them near the top (after the DATA and SET statements) to avoid mixing them among executable statements. Also, it may be necessary to place statements that refer to variables created in the DATA step 'prior' to an executable statement which contains the first appearance of the variable name. Executable statements are processed in a sequential manner, returning to the first statement when all statements in the DATA step for the current observation have been processed. Unless a RETAIN statement is entered containing specific variable names, all variables input read from an external dataset or computed in the DATA step are assigned missing values at the beginning of the next iteration. The SET statement then reads data from the next observation in the existing dataset and based on their values assigns values to variables in the new dataset (see Chapter 4.1: The DATA and SET Statements for examples). If you're reading an external file, an INPUT statement reads the next observation and assigns its values to the named variables. Once the end of DATA step is reached, the observation is automatically OUTPUT to the newly defined dataset. SAS then returns to the beginning of the loop and automatically resets all variables to missing values before processing the next observation (unless told otherwise by the RETAIN statement). The INPUT statement then reads the next line/observation.