June 5, 2007

To: University of Oregon University Senate

From: Joint Senate-Academic Affairs Committee on Student Evaluations

Subject: Final Report

Introduction

In winter of 2007, the University Senate and the Office of Academic Affairs charged this committee with the task of re-examining how the university assesses academic excellence in teaching through course evaluations. (See

http://www.uoregon.edu/~uosenate/dirsen067/JSAADCTE13Feb07.html). This process was first described by CAS Associate Dean Priscilla Southwell at the February 3, 2007 meeting of the University Senate. This preliminary report is a culmination of this process, and we anticipate that members of the Senate will be prepared to discuss this matter during the 07-08 academic year. A final version of this report will be circulated at the end of May 2007 after the Academic Deans have been able to review it.

Course evaluations serve many different purposes in this university. Students use this information in choosing classes; faculty are provided with feedback that helps them increase their teaching effectiveness; and administrators rely, in part, on these evaluations for the purposes of merit evaluation and promotion and tenure. Over the past several months, we became aware that certain questions are therefore more important to some of these constituencies than to others. Our recommendations attempt to provide more useful information to all of these varied groups within the university community.

Other considerations were more structural in nature, but crucial nonetheless. As discussed later in this report, our current student evaluations are not always in compliance with University Senate legislation Moreover, the results are difficult to interpret and not readily accessible. Our recommendations therefore attempt to meet multiple goals: legislative, scientific, and practical.

The recommendations provided below are but a first step in a crucial change in the method by which we evaluate teaching quality. The committee has engaged in vigorous debate and discussion and has consulted with various stakeholders. In the end, we reached consensus about these recommendations, but we urge you to continue this discussion with members of your department or program.

COMPOSITION OF COMMITTEE AND CONSULTANTS

Committee:

Priscilla Southwell, Chair (CAS Dean’s Office and Political Science)

Bertram Malle (Psychology)

Deb Bauer (LCB)

Regina Psaki (Romance Languages and Literatures)

Michael Filippelli (ASUO student)

Gordon Sayre, ex officio, (English)

Consulted Faculty and Staff :

Charles Martinez, Vice-Provost for Institutional Equity and Diversity

Georgeanne Cooper, Teaching Effectiveness Program

Patricia Gwartney, former director, Oregon Survey Research Lab

Kate Wagle, Department of Art

Christian Cherry, Department of Dance

Herb Chereck, University Registrar

RECOMMENDATIONS

I. REQUIRED QUESTIONS

We have expanded the number of required questions to ten. The main rationale for this was the desire to provide questions for all of the relevant constituencies (instructors, administrators, and students) and to make course evaluations more comparable across units and schools. We also determined that two of the currently required questions (# 3 and #4, added by Senate legislation in 1999 were ambiguous composites of multiple questions that needed to be broken up.

Furthermore, we concluded that the current two basic instructor and course questions were too global, inviting subjective assessments of liking, attractiveness, personality characteristics, and so on. The recommended formulation “quality of teaching” focuses the evaluation on the instructor’s most pertinent, relatively more objective characteristics (See # 1 below.) The course evaluation (#2) question was formulated in a parallel manner. The committee also examined the questions used by other of our AAU peer institutions, and we discovered that most of them rely on such simple, short questions.

We added several more student-oriented questions to the set of required questions. Although many departments and programs already include such questions, the results are not available online unless they are included in the required set of questions.

Finally, all items are formulated as true questions (not as statements), and the response options follow best practices in survey research. (See sections V. and VI. for details).

1. What was the quality of the instructor’s teaching?

| Decline to answer

2. What was the quality of this course?

| Decline to answer

3. How organized was this course (e.g., syllabus, schedule)?

Very organized | Somewhat organized | Not very organized | Not at all organized

| Decline to answer

4. How efficient was the instructor’s use of class time?

Very efficient | Somewhat efficient| Not very efficient | Not at all efficient
| Decline to answer

5. How available was the instructor for communication outside of class (e.g., during posted office hours)?

Very available | Somewhat available | Not very available | Not at all available

| Decline to answer

6. How clear were the guidelines for grading students in this course?

Very clear | Fairly clear | Somewhat unclear | Unclear

| Decline to answer

7. How much did you learn in this course?

A great deal | A considerable amount | A little | Not much

| Decline to answer

8. How often did you attend class?

Always | Often | Sometimes | Rarely or never

| Does not apply | Decline to answer [here either one of the version is suitable. For example, if it’s an on-line course, the concept of class attendance does not apply.]

9. How many hours per week did you spend on this course, other than time in class?

____ hours per week

| Decline to answer

10. What grade do you expect in this course?

A | B | C | D | F P| NP

| Decline to answer

II. Suggested Wording for Qualitative Questions

a. Please comment on the instructor’s strengths and areas for possible improvement.
[Textbox]

b. Please comment on the course’s strengths and areas for possible improvement.
[Textbox]

III. Optional Questions (See Appendix A for Commonly-Used Questions)

In order to increase the completion rate, we recommend that the total number of questions be limited to 20 quantitative questions. We also recommend that all such questions follow the recommended format - in the form of a question with 4 or 6 possible responses. See Section V (“Question Formulation”) for the discussion of this format.

Note: Some departments and programs currently include a question related to the instructor’s demonstration of respect or fairness toward students. The committee had much discussion about the desirability of including this type of question within the required set, but we concluded with the recommendation that each unit have a discussion about the appropriateness of including such a question or questions.

IV. Recommended Introduction and Order of Questions

a. Before the questions are presented, there should be a preamble that highlights the importance of teaching evaluations: “Please take your time in completing the questions below. Your responses are important both for students, as a guide to choosing their courses, and for the instructor, to improve teaching effectiveness and as a basis for tenure, promotion, and merit evaluations.”

b. Next, three questions assess student demographics (sex, year in school, ethnicity), all with a “Decline to Answer” option.

c. Finally, all other questions follow in this recommended order: first, the required quantitative questions; second; the qualitative questions; third, the set of optional questions chosen by the unit.

V. Question Formulation

In our existing course evaluations, some items were questions, some were statements. Statements were always formulated as positive and therefore carried with them the presumption of positivity (e.g., “The course was intellectually stimulating”). As a result, responses were likely pulled toward the positive end of the scale.

Currently, certain questions violate basic rules of survey methodology—for example, they asked two things at once (e.g., “Do you believe that the class time was well organized and efficiently used?”) or were ambiguous (e.g., “Did the instructor encourage communication outside of class time”? — communication between whom?).

Certain existing items also include phrasing designed to invite a relative judgment (e.g., “In comparison with other UO courses of this size and level…”). There was much discussion about such phrasing. One perspective was that relative judgments are desirable, because they may provide a fairer assessment (e.g., large lecture courses vs. seminars). Another perspective was that, even though relative judgments may be desirable, there is little reason to believe that the phrase in question will reliably trigger the correct relative standard. There may be confusion over what exactly the terms “size” and “level” mean; and some students may not have had UO courses of this particular size and level. Additionally, if we truly assume that students succeed in making a relative judgment, then the average ratings of all courses must be assumed to be directly comparable. Scores in a large lecture course in chemistry (e.g., 7.0) must then be directly compared to scores in an upper-division seminar on relationships (e.g., 9.0), because students, by assumption, already corrected for any differences in size and level. Most instructors would resist this interpretation, which suggests that there is little faith in the success of the relative phrasing of questions.

Exploratory structured interviews with five current students (see Appendix B) suggested three conclusions: The “in comparison” preamble appears to be ignored for three out of the four currently required questions, and for the one question in which it is heeded (the overall course evaluation question), only the size comparison is taken into account. That size comparison, however, seems to be made automatically, as a result of the term-long experience in the given kind of course (e.g., large lecture, small seminar), making the comparison phrasing superfluous.

Our recommended strategy is therefore to let students make their judgments in the most intuitive and direct way, and users of course evaluations can perform proper statistical comparisons afterwards: Ratings in upper-division seminars will be compared to ratings in other such seminars; ratings in required large lecture courses (with a known definition of such courses) will be compared to other such courses, and so on.

VI. Response Options

Response options in existing evaluation materials suffered from three problems. First, they were inflexible, using the same labels (Exceptionally good, good … unsatisfactory) for all questions, even when not fitting the specific question. Second, the number of response options was five whereas good survey methodology suggests an even number, typically four or six, in order to avoid having respondents easily choose the noncommittal middle category. Third, the five response options were transformed into number scores of 10, 8, 6, 4, 2, which is mathematically awkward and confusing (e.g., the scale may be read as a 10-point scale but has a midpoint of 6).

The newly proposed response options do not suffer from any of these problems.

Second, all questions have either four or six response options, with four being the default. Where fine distinctions seem possible and desirable, six options are offered.

Third, the four or six response options should be transformed into numbers 5,4,3,2,1,0 using zero as a reference point or lowest possible score.

An additional recommendation is to provide a no-response option for cases in which students either don’t have information to answer the question or prefer not to answer it. At the same time, this no-response option should not be too attractive (or too inviting as an alternative to thinking about the question). Thus, the formulation “Decline to Answer” and “Does Not Apply” is recommended.

VII. Availability of Data

Besides the means (and percentages) for required questions that are currently posted on the University Registrar’s website, we propose to a future implementation committee that the following data be delivered to departments and instructors, at least for each of the required questions. We also recommend that these data be accessible to students while they register via Duck Web, ideally within one month after the grades are due.

Grand mean (or percentages for categorical questions) for UO (on an annual basis)

Mean (or percentages for categorical questions) for School (e.g., CAS)

Mean (or percentages for categorical questions) for Department (on an annual basis)

Mean (or percentages for categorical questions) of each course over the past n years

Mean (or percentages for categorical questions) of each instructor for this course

Third-week enrollment for this class

Response rate (# of student responses/3^rd week enrollment)

VII. ADDitional ISSUES

Although our committee was asked to consider whether student course evaluations should include some assessment of the instructor’s research, we agreed that such evaluations should be done in a separate manner.

Similarly, we did not want to incorporate into course evaluation the students’ assessment of other university academic requirements, such as the effectiveness of general education or multicultural courses. However, we did view positively the idea of an optional link from the course evaluation page to a separate on-line survey of such aspects of academic life.

Appendix A: Common Characteristics of Current Evaluation Forms

Overview of Current University of Oregon Numerical Course Evaluation Forms:

Number of questions ranges from 7 (English, Geology) to 43 (LCB)

Not all forms actually include the four required questions mandated by Senate legislation in 1999. They are missing from forms used in Education and Chemistry, and possibly others.

Not all forms clearly distinguish between questions which pertain to the qualities of the instructor, and questions which pertain to other aspects, such as textbook, laboratory handbooks, etc. However, there are some units which have devised particular forms to be distributed to students in labs or discussion sections, which do have questions targeted to those settings, and to the GTFs or lab technicians to teach or run them.

Many forms include additional questions which do not use the common 5-pt. scale. Examples are questions about major, college affiliation, # of hours studying each week, or:

YOUR performance in this class

What is your primary reason for taking this course?

A - required for major B- University Graduation Req. D-personal interest D-career interest

Many units ask a series of question about students' effort and performance, such as expected grade: ABCDF, grade at midterm, or:

At the final, I will have completed: A - all assigned material B-3/4 C-1/2 D-1/3

How often did you attend class?: A - every class B - 3/4 C - 1/2 D - less than 1/2

Some units include questions related to respect, tolerance, or classroom atmosphere. Examples:

The teacher treated students with respect (several units)

The classroom atmosphere invited students to express their ideas openly and seriously (English, College Writing)

The instructor treats students with respect. (East Asian Langs. and Lits., German)

The INSTRUCTOR'S allowance of the expression of other viewpoints. (Art)

(in "Evaluate the Instructor's Teaching" section): Objective and fair in considering student viewpoint.

Appendix B: Structured Interviews with Five Current Students

Below are notes and memory protocols of structured interviews with five students (juniors and seniors in psychology, with double majors in other fields) about the currently required course evaluation questions. The goal of theses interviews was to establish students’ interpretations of these questions. In particular, we were interested in students’ reading of introductory clause --- “In comparison with other UO courses of this size and level…” and the clarity of the questions about organization and class time use.

For each students interviewed, the initial instruction was: “I am going to show you a few questions that you are familiar with. Just read them and tell me what you normally think when you answer those questions, what goes through your head when you answer them. Let’s start with the first question. Tell me what you think when you read this.” Students then received a sheet of paper with the four questions printed on it.

16. “In comparison with other UO courses of this size and level, how do you evaluate this course?”

Student A. I think of classes I took of the same size. (What did you do when you were a freshman?) I compared the course to those I had in high school. [Never mentions level.]

Student B. I don’t normally take the first part of the sentence into consideration. With a bad course, you know it. I may visualize the size of the course, it’s like having associations, it reminds you of other classes like it…, but you already know that during the term. (What did you do when you were a freshman?) I guess I just evaluated it directly (And how about level?) I never look at that.

Student C. I think of the type of class it was: smaller, interactive or a larger lecture course. (How about level?) You mean, how hard it is? (No like 300, 400.) Oh. I never think of that.

Student D. I kind of think of the room size, how many students are in the course, maybe numeric level, level of students… But it’s all pretty quick. Yeah, I guess room size is what I think of.

Student E. In general, how the course worked for me. You know, quizzes, exams, how effective it was, what I learned. (Do you ever think of the size or level of the course?) No, it’s easy to just say what it was like.

Interpretation: At best, the introductory clause encourages students to think about class size (not about the ambiguous “level”). However, it appears that students cannot help but think of class size throughout the term, so this clause may not achieve anything that a straightforward question about the course wouldn’t achieve.

17. “In comparison with other UO courses of this size and level, how do you evaluate this instructor?”

A. I don’t really think of anything special. (You just think of the instructor?) Yeah.

B. (And how about the second question?) ah,… just the instructor. How I felt about the instructor. But you compare people over time (you gain a frame of reference from all your classes?) Yeah.

C. If it’s a large class you can tell if they can handle it; I think of the same classes as in the previous question. You know, whether they make eye contact, whether it’s easy to communicate with them… (that makes for a more positive evaluation?) Yeah.

D. I base it pretty much entirely on the instructor. Don’t take size into account.

E. When someone is a good lecturer, fair grader, when they are available. And I guess person things: whether they are interesting, boring. (Do you ever think of size or level?) No.

Interpretation: There is little evidence that the introductory clause encourages size- and level-specific comparisons with other instructors. (Also note that to do so the clause would have to be written as “In comparison with instructors of other UO courses of this size and level,…). More so than with evaluations of the course as a whole, students appear to express their appreciation (or lack thereof) for the instructor’s teaching.

18. “In comparison with other UO courses of this size and level, do you believe that class time was well organized and efficiently used throughout the course?”

A. N/A

B. How the lectures were organized, whether the slides were helpful, whether the lecture drags on to fill the last 20 minutes or cram everything in at the end.

C. You know, the unorganized professor, whether they are prepared, whether the power point stuff is working, whether they finish the slides or have all kinds of side stories. Or whether they tell you “Don’t remember any of what I just told you.”

D. Whether they are well-prepared, have all their notes, waste time with paperwork at the beginning, and also whether there is too much question & answer during the lecture, you know, whether they get side-tracked.

E. I think of the syllabus, whether they actually go through it. You know, hold to “the contract.” Whether they go through with the plans. (And efficient?) Whether they use class time well — whether there’s enough time to the get material into the time.

Interpretation: The comparison clause does not seem to have any particular impact. Students take the question to refer primarily to the organization of class time. A split into class time and overall course organization appears prudent.

19. “In comparison with other UO courses of this size and level, how well did the instructor encourage communication outside of class time?”

A. N/A

B. I think of office hours, Blackboard.

C. What they say during class about office hours, how many office hours they have, whether they are available.

D. How often they remind you about office hours. And, ‘if you have questions, here is the email.’ Also the TA, whether they repeat that you can contact them. And available meeting times, and when you can’t make the times, whether they are accommodating about other times.

E. I think of office hours, whether they encourage study groups, yeah… mainly office hours.

Interpretation: The comparison clause does not seem to have any particular impact. Communication is clearly interpreted as communication between student and instructor. A slight reformulation may sharpen this focus (e.g., “How available was the instructor for communication outside of class (e.g., during posted office hours)?”

Introduction

COMPOSITION OF COMMITTEE AND CONSULTANTS

I. REQUIRED QUESTIONS

II. Suggested Wording for Qualitative Questions

III. Optional Questions (See Appendix A for Commonly-Used Questions)

IV. Recommended Introduction and Order of Questions

V. Question Formulation

VI. Response Options

VII. Availability of Data

Mean (or percentages for categorical questions) for Department (on an annual basis)

Mean (or percentages for categorical questions) of each course over the past n years

Response rate (# of student responses/3rd week enrollment)

VII. ADDitional ISSUES

Appendix A: Common Characteristics of Current Evaluation Forms

Appendix B: Structured Interviews with Five Current Students

Response rate (# of student responses/3^rd week enrollment)