Section 2.6: How to Write a SAS Dataset In order to apply SAS procedures or manipulate datasets, all data must first be placed into a SAS dataset. The DATA step is designed for this purpose. Usually, data will be entered into a spreadsheet such as Excel or into an ASCII text file (free or fixed format) to be read by SAS. If the number of records and/or variables is relatively "small", data can be placed directly within the SAS program following a CARDS; or DATALINES; statement at the bottom of the DATA step. Reading data into temporary datasets is an economical use of disk space and avoids the clutter of having many files. Some important features of SAS datasets include: 1. SAS datasets can be either temporary or permanent. "Temporary" means they are stored in the 'work' directory and will disappear when you exit PC SAS for Windows (in the batch program terminates). "Permanent" means they have been saved in SAS dataset format in a file with a name on the hard disk (the suffix depends on the type of operating system and version of SAS which produced it). 2. The prefix consists of up to eight letters (a...z), numbers (0,1,...,9), or the underscore (_) which identifis a dataset name and briefly describes the contents of the file. Note in PC SAS for windows if you write dataset names that begin with the underscore _, they will not be visible to the SAS EXPLORE window; however, they are placed in the designated directory. You can treat them the same as any other dataset, even though you cannot see their name. When the SAS system operates the underscore serves as the prefix to the temporary datasets which are intended to be hidden from SAS users. So don't be surprised if you can't see the datasets with this initial character that you may have generated in your program. 3. The first character of the dataset name must be a letter or the underscore, even when a permanent data set is created (e.g., a temporary data set called 1917s or a permanent data set .1917s are NOT valid dataset names - more about permanent datasets and the use of later). Below are the keywords for the SAS DATA step statements that are needed to read an external space-delimited text file called mydata.dat on a PC system located in a subdirectory called c:\data\mydata: DATA new; INFILE "c:\data\mydata.dat"; INPUT name $ b c; RUN; In this short set of commands, a temporary SAS dataset called "new" is defined on the DATA statement. The external file to be read by SAS is called mydata.dat which is assumed to be in free text format (space delimited), located in the c:\data directory as defined on the INFILE statement. (On a unix system when running a program from your own user area the notation ~/data is a sufficient abbreviated version of the complete working directory.) The text file to be read in this example is called mydata.dat and contains three distinct variables called name (assumed to be 8 characters or less) and b c (both numeric) as listed on the INPUT statement in free format. In the datafile, they are to be separated by 1 or more spaces and have either the actual value or one of the 28 SAS missing data codes entered. Temporary SAS datasets disappear after you exit Windows or run a batch program. With large or frequently accessed datasets, writing permanent SAS datasets (i.e., saved on the hard drive) may have several advantages. How to work with "permanent" SAS datasets will be described in Chapter 4.