Many professors dread anonymous student evaluations of teaching (SETs). For too many of them, whether female, on contract, a member of an equity group or, worst of all, all of the above, less than stellar reviews could mean the end of their employment or a serious roadblock on their career path.
This is why an arbitration award in a dispute between Ryerson University and the Ryerson Faculty Association rendered at the end of June has generated so much interest both in Canada and internationally. The decision confirms what the body of research has concluded for years: student evaluations are not a barometer with which to judge a professor’s teaching effectiveness and therefore should not be used for employment-related decisions such as promotion and tenure.
“The award is significant and we hope it will have a ripple effect — universities should not be using student opinion surveys for tenure and promotion decisions,” said CAUT executive director David Robinson.
The faculty association and Ryerson administrators had been at odds in bargaining over the issue since 2003. Facing an impasse, the two sides agreed to the creation of a joint committee and an ongoing pilot project to address concerns over the faculty course surveys. Unable to resolve the matter, the faculty association filed a first grievance in 2009, then a second in 2015. This remained a bone of contention after the 2015–2016 bargaining round. The matter went to an unsuccessful mediation and proceeded to a hearing in Toronto last April in front of arbitrator William Kaplan.
Ryerson University argued that “notwithstanding any of the identified problems with the FCS tool, it provided relevant information” about faculty assessment. The university noted that the questionnaires “could not be determinative of tenure and promotion decision, but they identified trends and concerns — they raised flags — requiring further investigation.”
This argument didn’t convince Kaplan. In his decision, he found that the evidence presented by the faculty association clearly established the “serious and inherent limitations” of student surveys. That evidence included expert testimony and peer-reviewed publications proving that numerous factors, especially personal characteristics such as race, gender, accent, age and even the level of “attractiveness” of the professor skew the results.
“The expert evidence led at the hearing persuasively demonstrates that the most meaningful aspects of teaching performance and effectiveness cannot be assessed by SETs,” Kaplan writes. “Insofar as assessing teaching effectiveness is concerned — especially in the context of tenure and promotion — SETs are imperfect at best and downright biased and unreliable at worst.”
Research casting doubt on the reliability of student questionnaires has been growing over the last few years. A study published earlier this year in PS: Political Science & Politics found that identical online introductory political science courses administered by a female instructor versus a male instructor were rated very differently.
Researchers Kristina Mitchel and Jonathan Martin discovered that students not only evaluate their professors differently based on whether they are women or men, but also the male instructor received statistically significantly higher evaluations than the female instructor. The researchers also found the language students used in official open-ended course evaluations and in online anonymous commentary suggests gender bias, with women more likely to be viewed as not having as much experience and education or as being less accomplished than men.
An analysis published last year by Bob Uttl (Mount Royal University), Carmela A. White (UBC-Okanagan) and Daniela Wong Gonzalez (University of Windsor) in Studies in Educational Evaluation found that students do not learn more from professors with higher student evaluation scores. In other words, student opinion ratings and student learning are unrelated.
Two reports commissioned by the Ontario Confederation of University Faculty Associations also concluded that student questionnaires were unreliable and were submitted as evidence by the Ryerson Faculty Association during arbitration. The former director of the Center for Teaching and Learning at the University of California (Berkeley), Richard Freishart, and statistician Philip B. Stark (Berkeley) highlighted major flaws in the methodology of student surveys on top of ethical concerns around confidentiality and informed consent that raised human right issues such as bias about race, gender, ethnicity, accent, age and “attractiveness.”
“I think the evidence is clear about how flawed student questionnaires are,” says Sophie Quigley, a computer science professor at Ryerson University who was the faculty association’s grievance officer in 2009. “Some instructors may find some of the responses to some of the questions interesting from a formative point of view. After all, most of us are interested in student feedback on the courses that we work so hard at designing and delivering. If student surveys were used solely for formative purposes, to be viewed by the instructor only, there might be a limited place for them in academia. But to actually make tenure decisions based on these surveys is a huge overreach and it’s highly inappropriate to have our colleagues’ careers depend on this extremely flawed instrument.”
University administrations are starting to pay attention. In May, the University of Southern California announced it would stop using student evaluations of teaching in promotion decisions and use the peer-review model instead. The university will still use student evaluations to “provide feedback about students’ learning experiences and to give context, but not as a primary measure of teaching effectiveness during faculty review processes given their vulnerability to implicit bias and lack of validity as a teaching measure,” wrote the vice-provost for academic and faculty affairs in a memorandum to the academic senate and faculty council chairs.
The University of Oregon is also planning to stop using numerical ratings from student course surveys in tenure and promotion reviews. The institution is working on a peer review framework. “The goal is to increase equity and transparency in teaching evaluation for merit, contract renewal, promotion and tenure, while simultaneously providing tools for continual course improvement,” according to the university’s website.
In his decision, Kaplan noted that student opinion surveys may still have value since they are the main source of information from students about their educational experience. “SET results have a role to play in providing data about many things such as the instructor’s ability to clearly communicate, missed classes made up, assignments promptly returned, the student’s enjoyment and experience of the class, and its difficulty or ease, not to mention overall engagement.” But he cautioned that the data should be “carefully contextualized” and that the “strengths and weaknesses of the SET need to be fully understood.”