SAS interview questions & material

1) What SAS statements would you code to read an external raw data file to a DATA step?

Ans : INFILE statement.

2) How do you read the variables that you need?

Ans : Using Input statement with the column pointers like @5/12-17 etc.

3) Are you familiar with special input delimiters? How are they used?

Ans : DLM and DSD are the delimiters that I’ve used. They should be included in the infile statement. Comma separated values files or CSV files are a common type of file that can be used to read with the DSD option. DSD option treats two delimiters in a row as MISSING value. DSD also ignores the delimiters enclosed in quotation marks.

4) If reading a variable length file with fixed input, how would you prevent SAS from reading the next record if the last variable didn't have a value?

Ans : By using the option MISSOVER in the infile statement. If the input of some data lines are shorter than others then we use TRUNCOVER option in the infile statement.

5) What is the difference between an informat and a format? Name three informats or formats.

Ans : Informats read the data. Format is to write the data.

      Informats: comma. dollar. date.Formats can be same as informats

      Informats: MMDDYYw. DATEw. TIMEw. , PERCENTw,

      Formats: WORDIATE18., weekdatew.

6) Name and describe three SAS functions that you have used, if any?

Ans : LENGTH: returns the length of an argument not counting the trailing blanks.(missing values have a length of 1)

Ex: a=’my cat’;

x=LENGTH(a); Result: x=6…

SUBSTR: SUBSTR(arg,position,n) extracts a substring from an argument starting at ‘position’ for ‘n’ characters or until end if no ‘n’.

Ex: A=’(916)734-6241’;

X=SUBSTR(a,2,3); RESULT: x=’916’

TRIM: removes trailing blanks from character expression.

Ex: a=’my ‘; b=’cat’;

X= TRIM(a)(b); RESULT: x=’mycat’.

SUM: sum of non missing values.

Ex: x=Sum(3,5,1); result: x=9.0

INT: Returns the integer portion of the argument.

7) How would you code the criteria to restrict the output to be produced?

Ans : Use NOPRINT option.

8) What is the purpose of the trailing @ and the @@? How would you use them?

Ans : @ holds the value past the data step.@@ holds the value till a input statement or end of the line.

Double trailing @@: When you have multiple observations per line of raw data, we should use double trailing signs (@@) at the end of the INPUT statement. The line hold specifies like a stop sign telling SAS, “stop, hold that line of raw data”.

Trailing @: By using @ without specifying a column, it is as if you are telling SAS,” stay tuned for more information. Don’t touch that dial”. SAS will hold the line of data until it reaches either the end of the data step or an INPUT statement that does not end with the trailing.

9) Under what circumstances would you code a SELECT construct instead of IF statements?

Ans : When you have a long series of mutually exclusive conditions and the comparison is numeric, using a SELECT group is slightly more efficient than using IF-THEN or IF-THEN-ELSE statements because CPU time is reduced.

Select: begins with select group.

When: identifies SAS statements that are executed when a particular condition is true.

Otherwise (optional): specifies a statement to be executed if no WHEN condition is met.

End: ends a SELECT group.

10) What statement you code to tell SAS that it is to write to an external file? What statement do you code to write the record to the file?

Ans: PUT and FILE statements.

11) If you donot want any SAS output from a data step, how would you code the data statement to prevent SAS from producing a set?

Ans : Data _Null_

12) How would you include common or reuse code to be processed along with your statements?

Ans : By using SAS Macros.

13) If you have a data set that contains 100 variables, but you need only five of those, what is the code to force SAS to use only those variable?

Ans : Using KEEP option or statement.

14) Code a PROC SORT on a data set containing State, District and Country as the primary variables, along with several numeric variables.

Ans : Proc sort data=

BY State District County ;

Run ;

15) How would you delete duplicate observations?


16) How would you delete observations with duplicate keys?


17) How would you code a merge that will keep only the observations that have matches from both sets.

Ans : Check the condition by using If statement in the Merge statement while merging datasets.

18) How would you code a merge that will write the matches of both to one data set, the non-matches from the left-most data.

Ans : Step1: Define 3 datasets in DATA step

         Step2: Assign values of IN statement to different variables for 2 datasets

         Step3: Check for the condition using IF statement and output the matching to first dataset and no matches to different datasets

Ex: data xxxmerge yyy(in = inxxx) zzz (in = inzzz);by aaa;if inxxx = 1 and inyyy = 1;run;

19) What is the Program Data Vector (PDV)? What are its functions?

Ans : Function: To store the current obs;PDV (Program Data Vector) is a logical area in memory where SAS creates a dataset one observation at a time. When SAS processes a data step it has two phases. Compilation phase and execution phase. During the compilation phase the input buffer is created to hold a record from external file. After input buffer is created the PDV is created. The PDV is the area of memory where SAS builds dataset, one observation at a time. The PDV contains two automatic variables _N_ and _ERROR_.

20) At compile time when a SAS data set is read, what items are created?

Ans : Automatic variables are created. Input Buffer, PDV and Descriptor Information

21) Name statements that are recognized at compile time only?

Ans : PUT

22) Name statements that are execution only.


23) In the flow of DATA step processing, what is the first action in a typical DATA Step?

Ans : The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1.

24) What is the one statement to set the criteria of data that can be codedin any step?

Ans : Options statement.

25) What is the effect of the OPTIONS statement ERRORS=1?

Ans : The –ERROR- variable ha a value of 1 if there is a error in the data for that observation and 0 if it is not.

26) What's the difference between VAR A1 - A4 and VAR A1 -- A4 ?

Ans : There is no difference between VAR A1-A4 an VAR A1—A4. Where as If u submit VAR A1---A4 instead of VAR A1-A4 or VAR A1—A3, u will see error message in the log.

27) What do the SAS log messages "numeric values have been converted to character" mean? What are the implications?

Ans : It implies that automatic conversion took place to make character functions possible

28) Why is a STOP statement needed for the POINT= option on a SET statement?

Ans : Because POINT= reads only the specified observations, SAS cannot detect an end-of-file condition as it would if the file were being read sequentially.

29) How do you control the number of observations and/or variables read or written?

Ans : FIRSTOBS and OBS option

30) Does SAS 'Translate' (compile) or does it 'Interpret'? Explain.

Ans : Compile

31) What does the RUN statement do?

Ans : When SAS editor looks at Run it starts compiling the data or proc step, if you have more than one data step or proc step or if you have a proc step Following the data step then you can avoid the usage of the run statement.

32) Why is SAS considered self-documenting?

Ans : SAS is considered self documenting because during the compilation time it creates and stores all the information about the data set like the time and date of the data set creation later No. of the variables later labels all that kind of info inside the dataset and you can look at that info using proc contents procedure.

32) What is the different between functions and PROCs that calculate the same simple descriptive statistics?

Ans : Functions can be used inside the data step and on the same data set but with proc's you can create a new data sets to output the results. May be more ...........

33) What is a method for assigning first.VAR and last.VAR to the BY group variable on unsorted data?

Ans : In Unsorted data you can't use First. or Last.

34) How do you debug and test your SAS programs?

Ans : First thing is look into Log for errors or warning or NOTE in some cases or use the debugger in SAS data step.

35) What other SAS features do you use for error trapping and data validation?

Ans : Check the Log and for data validation things like Proc Freq, Proc means or some times proc print to look how the data looks like .

SelectionFile type iconFile nameDescriptionSizeRevisionTimeUser