David Velleman at Left2Right responds to a recent New York Times article (sorry, gone) describing five students at U. Arizona, stirring up a firestorm of comments.
Responding to the Times' estimate that 20% of college students are "disengaged"—drinking their way through easy classes on their parent's dime until they get the Magic Piece of Paper—David writes:
The Times article rightly points out that students would be more engaged if more were demanded of them. But here is where matters get complicated. Sensitive to complaints about the quality of teaching, universities require professors to be evaluated by their students at the end of every course, and these evaluations now play a role in tenure, promotion, and merit pay. But the evaluations are just consumer-satisfaction questionnaires, which generally reveal how much the students liked the course but not how much they learned. And professors suspect, with some justification, that giving low grades harms their evaluations.Now, I am not making excuses for teachers who expect too little or grade too leniently. There is plenty of the blame to go around here, as they say, and some of it surely belongs with the professors. My point is that measures designed to bring accountability to education can sometimes backfire. If consumer-satisfaction questionnaires encourage professors to be lenient, and leniency encourages students to be disengaged, and disengaged students discourage professors from investing time and effort in their teaching, then "accountability" hasn't benefited anyone. (If universities really want to improve teaching, they will have to develop better methods of evaluating instruction. But that's a topic for another day.)
Dan's characterization of student course evaluations are right on the money (so to speak). Here are the top two questions on the forms that UIUC students fill out every semester:
- Rate the instructor's overall teaching effectiveness.
- Rate the overall quality of this course.
In each case, the answer is an integer between 1 (abysmal) and 5 (perfect), chosen by completely filling in the appropriate oval with a number 2 pencil (and not a pen). There are lots of other questions on the form, some with numerical answers and others free-form, but only these two standardized questions are collated by the university for later reporting to tenure, promitions, and awards committees.
In my opinion, student evaluations are terrible metrics for measuring actual teaching quality. Lazy students give lower evaluations to instructors that expect them to actually work or who don't artificially inflate their grades. Students confuse enthusiasm and authority with effective teaching. Students give lower scores if they've been exposed to disagreeable facts (or, as they are ever more commonly known, theories). Students give higher scores to tall male instructors of their own race than to short female "ethnic" instructors. Students can be directly manipulated into giving better evaluations.
My scores are usually quite good.
A few years ago, psychologists Nalini Ambady and Robert Rosenthal exposed undergraduates to silent 10-second video clips of various Harvard instructors, and then asked them how accepting, confident, enthusiastic, professional, warm, etc. they thought the instructors were. Ambady and Rosenthal found strong and statistically significant correlations [pdf] between these snap judgments and the instructors' teaching evaluations.
I don't mean to suggest that student expectations are not a factor in teaching effectiveness. Students definitely learn more when they expect to learn more. These expectations are sometimes revealed in rather disturbing ways. The The Chronicle of Higher Education recently reported the following experiment in an article discussing instructors with foreign accents:
In 1988 Donald L. Rubin, a professor of education and speech communication at Georgia,... gathered American undergraduates inside a classroom and then played a taped lecture for them over high-fidelity speakers. The lecture -- an introduction to the Mahabharata, say, or a discourse on the growing scarcity of helium -- was delivered in the voice of a man from central Ohio.While the undergraduates sat and listened, they faced an image projected onto the classroom wall in front of them: Half the time, it was a photograph of an American man ("John Smith from Portland"), standing at a chalkboard and staring back at them. For the other half of the testing groups, the slide projected before them was that of an Asian man ("Li Wenshu from Beijing"), standing at the same chalkboard. The two figures were dressed, posed, and groomed as similarly as possible.
Now for the interesting part: When the students were asked to fill in missing words from a printed transcript of the central Ohioan's taped speech, they made 20 percent more errors when staring at the Asian man's image than they did when staring at the picture of "John Smith."
I also don't mean to suggest that student feedback is not valuable for improving one's teaching. Even though the administration at my university seems to ignore them, most instructors I know take students' free-form comments very seriously. Students are full of excellent teaching ideas. (Unfortunately, they're also full of the opposite.)
What I object to is the reduction of student opinion about teaching effectiveness to a single number, and then using that number as an objective measure of actual teaching effectiveness. Yes, the two are correlated, but they're not the same thing.*
Do I have an alternative suggestion? Well, sort of, at least for required low-level classes, where effective teaching is arguably most important. Here's my modest proposal: To measure effectiveness at teaching course X, look at student performance in courses for which X is a formal prerequisite. To find the most effective calculus teachers, look at how students perform in differential equations classes. All else being equal, if the students who had calculus from Professor A generally do worse at diffyQs than students who took calculus from Professor B, then Professor B is a more effective instructor. All the data is there in the registrar's computer system; why not put it together?
(Because that would require work.) Sssh!
Perhaps we can even generalize to higher-level courses: To evaluate instruction within a given department, poll its alumni for salary information and/or grad/medical/law/business school placement and correlate with the instructors they had. Instructors whose students make more money or get into better schools are arguably more effective teachers. (Some might make this argument at the level of departments, but that would be stupid.)
Cold Spring Shops points to David's post and alludes to a beautiful quotation by William Arrowsmith:
At present the universities are as uncongenial to teaching as the Mojave Desert is to a clutch of Druid priests. If you want to restore a Druid priesthood you cannot do it by offering prizes for Druid-of-the-year. If you want Druids, you must grow forests.
* I'm aware of the irony of this objection. We professors reduce our opinions of student mastery to numbers—you know, grades—knowing full well that those numbers will be misinterpreteted as measures of actual student mastery. On the other hand, that's our job. Professors are hired and promoted, at least in part, at least in principle, for the accuracy of our opinions in the subjects we teach.
Actually measuring what you suggest would probably make a fantastic math ed PhD if we could somehow get around the privacy issues.
Posted by: Kim | May 02, 2005 at 11:30 AM
At my "Premier Catholic Teaching University" which I attended as an undergrad in the engineering department, the professors were evaluated annually by their peers, usually by the department chair. Moreover, the department was reviewed by the ABET (Accreditation Board for Engineering and Technology, Inc) in order to keep with certain standards of topics for teaching. ABET reviewed student's work and course syllabus. Maybe it's just a silly hoop they jumped through, but I will say that the department's worst teachers exceeded the ability of some of the best I've seen here in the midwest.
Posted by: femaleCSGradStudent | May 02, 2005 at 02:42 PM
I agree with your criticism of the current evaluation system.
Not that I have a better idea, but I think one problem with your suggestion is that "performance in diffyQs" and "salary figures/placement in grad school" are things which, while being affected a lot by the quality of teaching, are also affected a great deal by other factors. Thus to get useful information from these statistics, one will have to factor the affects of these other factors. That sounds difficult in the short run. You could average over several years and get statistically significant correlations, but it would take a while.
E.g. the dot com bubble and the following bust had a much bigger impact on salaries than any teaching. Performance in diffyQs would depend a lot on the person teaching diffyQs. There can be significant differences in student quality from year to year and there certainly are significant differences in student motivation from year to year.
Thus while these effectiveness measures will be great for estimate how good a teacher someone was over a career, the extraneous noise is probably reasonably large over a period of a few years.
Posted by: K | May 02, 2005 at 04:27 PM
Your modest proposal is definitely a step in the right direction. If nothing else, it's something to measure on top of the standard stuff that student evaluations measure so imperfectly - and it can't hurt to provide an additional metric.
I see one wee glitch, that would certainly be possible to correct for: one way that I could ensure that, say, my precalculus students kicked the asses of other instructors' precalc students once they got into calculus, would be to fail three quarters of my class. Then only the top 25%, or less, would even get INTO calc, and of course their grades would be higher, on average than those of the 50% of Teacher X's precalc class that went on to calculus.
I'd also like to see a study that correlates students' mastery of prerequisite material with the evals they give their instructors. I know, for instance, that a lot of my precalc students HATED me because I didn't take the time to reteach them grade six level math. My calc students, on the other hand, seemed to like me a lot - largely, I presume, because their abilities were more in sync with what I expected of them.
Posted by: Moebius Stripper | May 02, 2005 at 05:49 PM
Students can be directly manipulated into giving better evaluations.
Here's another frivolous manipulation method: give students so-so grades (arbitrarily) at the beginning of the semester, steadily increase the grades towards the end. Give evaluations out before the final exam. Give final exam, and grade by mastery at last.
Another modest proposal which is much more likely not to be used: evaluation by peers, that is your fellow teachers. That would totally suck.
Posted by: Mitch | May 03, 2005 at 07:52 AM
Somewhere around 1975 there was a lot of talk similar to this and many were passing around a study [would that I had the reference!] of a single large calculus class taught by a single prof. who gave all the lectures, wrote the exam, and was in charge of the grading. There were also many small sections taught by a number of different T.A.'s. Someone carried out exactly the (controlled) experiment you suggested and correlated the course grade with the students' evaluations of their arbitrarily assigned T.A.'s.
As best as I remember the correlation was inverse in that the students who down-rated their T.A.'s got the best grades. The conclusion was that students don't like T.A.'s who make them feel uncomfortable because they make it clear when the student doesn't understand something but these are the very ones students learn the most from.
But your point about how one does in the next course is crucial if very difficult to measure. At least in science and math, the major significance of most of what you learn is that it is the prerequisite for the next course, or, more accurately, the prerequisite for a deeper understanding of the subject rather than an end in itself. (Sometimes I'm not even sure this a convergent process....)
Posted by: SusanJ | May 03, 2005 at 09:11 PM