Blue Suede Shoes

Closing the Teach For America Blogging Gap
Mar 12 2013

The Evaluation Questions…

My former life and current life overlapped a few weekends ago at the LEARN Conference. The first breakout panel I attended, on teacher evaluation systems, resonated with me, especially having been affected by the new Teacher Effectiveness Initiative in my time in Memphis. A panelist brought up the need to study the evaluation systems and their validity and consistency. If a teacher is rated highly effective one year and rated not effective the next, there’s probably something of concern within the evaluation system. I didn’t experience such a startling change, but I did find myself reflecting about how much a rating could fluctuate and whether it really reflected teacher impact and student performance.

Anecdotally: in my first year, I was, admittedly, a frustrated failure in terms of my own perception of classroom time. Somehow, I was rated Level 5, and I swear I was nowhere near savvy enough to be teaching to the test beyond deconstructing standards and published test and practice test questions. Am I proud of my students’ growth? Hell yeah. Does my Level 5 rating go on my resume? Obviously. But if anyone were to ask about it and how I got there, I would have a difficult time coming up with any kind of real answer or reproducible strategy. “I did what I thought was right” sounds like a ridiculous way to describe how to teach.

Lest anyone think this is some kind of humblebrag, the story continues and my point about data finally appears, plus just read some posts from my year 1. I didn’t feel good about what I had accomplished, and even after the scores came in, I didn’t feel like I had lived up to what I imagined a true Level 5 teacher would be.

My second year, almost everything was better. My students were more invested, my lessons were better-planned, and I felt more confident about what I was doing in my classroom. I felt a lot of internal pressure to keep my Level 5 rating, to prove it hadn’t been a fluke, but at the same time, I recognized the challenges of doing so. My first year, I had had my algebra students for double block periods, over an hour and a half each day. In year 2, algebra plus, as those classes had been called, had been scrapped, and I had my students for half the time as before. It worried me a little, but I also realized that a lot less time was being wasted in my classroom in year 2 because I was a better classroom manager, and I had been able to develop stronger relationships with my students.

Going into test season, I felt more confident than I had in my first year. Coming out of it, with students telling me the test was easy, I felt pretty good about being Level 5 again, but of course, it meant more that, regardless of rating, I knew I had done better for my students.

When the scores were released, I found myself stumped by the results. A higher percentage of my students had scored proficient or advanced compared to my first year. The average score for my students was higher than it had been the year before. And yet, I was only a Level 3 this time. Of course I was disappointed. What gives? I thought. By my own feeling and my own calculations, I was a better teacher, but not only had I not kept my rating, but I had also dropped two levels down!

Upon analysis of individual students’ scores, I realized that one student who had been predicted to do relatively well had received the lowest score possible on the end-of-course test. I was stunned. I knew from class that this student, at the very least, knew how to factor – and was very good at factoring – which would have earned him some points. I didn’t know what to do beyond contacting the student, explaining the situation, and asking him to check with the test coordinator at the school about it when school started again. I suppose I could have inquired about test discrepancies or errors with the Department of Education, but in the grand scheme of things, it didn’t seem that would much make a difference. Maybe he really just bombed that day. And I didn’t expect that it would be high on anyone’s priority list to look into one test score that might have affected the rating level of a teacher who was already out of the system. As far as impact on the student, well, it would probably just lower his predicted score for the algebra 2 test, which wouldn’t reflect what I knew he was capable of, but if he had indeed earned that low algebra 1 score, what more could be said?

I calculated the numbers again both without that one student and if he had at least gotten in the ballpark of his predicted score. In both cases, I would more than likely have been Level 4 or Level 5. It’s interesting how one student can skew data so much to have such an impact. It’s worth noting that my second year, I only had one algebra class and taught only 20 students or so. Since geometry is not a tested subject, my other 100+ students didn’t factor into my value-added data. In my first year, I had had approximately 40 students across two algebra classes (and 50-60 in two geometry classes). I probably should have recognized sooner that this calculation system was imperfect, and I should have been more humble after my first year, when I accepted my rating for what it was, thinking, If I’m a hot mess, and I’m a Level 5, what is a Level 3 teacher like? Or a Level 1?, instead of thinking critically about what the numbers were actually reflecting.

A lot has been said about teacher evaluations and how unfair they can be. I agree, especially for those who teach untested subjects, but I certainly don’t know enough to suggest some better models. Is value-added better than an absolute standard? How much does a rating truly reflect what’s going on in the classroom? How much should student perceptions count? Principal and instructional coach observations? Admittedly, I haven’t looked into the research behind evaluation models; is any school system/district doing evaluations well and consistently? Is it too early to tell with so many new models being implemented?

Because teacher attrition fascinates me, and is hopefully what I will be able to study for my master’s research, I also wonder what role, if any, evaluations play in a teacher’s decision to leave. I want to say the rating itself might not be a key lever, but from my own experience, the kind of support I received and the feedback I got on observations (sometimes none…), ostensibly to improve my evaluation data, did not always make me feel like I was valued as a teacher or a person. A couple of short Forbes articles that Teach For America friends/co-workers and I shared over the past year or so reflected our own feelings about our school and our work, even if they weren’t explicitly about educators. Teaching is, obviously, not about an ego boost, and what I have observed is that it is, contrary to popular belief, a decidedly unselfish profession, but those are no reasons not to value teachers. Which brings me to one final question: does teacher evaluation, as it is, and as it could be, value teachers?

4 Responses

  1. meghank

    As a fellow Memphis teacher, thank you for thoughtfully reflecting on these issues.

    I believe that the real purpose of the Teacher Effectiveness Initiative (although of course it is never stated) is to drive experienced teachers out of the profession. What other purpose could evaluating teachers by an UNSTABLE measure serve?

    As a side note, I would like access to the formula you used to “calculate the numbers.” I didn’t think that formula was publicly available?

    • I calculated the averages of the students’ actual scores (with and without the one student, as I mentioned, and with the one student’s predicted score) and compared those values with the average of their predicted scores and the percentiles those corresponded to in 2012. It’s probably imperfect, but without the one student, or with his scoring something higher than the lowest score possible, the student average was many percentile points higher than the predicted percentile. (With the student, as it was, there was a marginal difference in actual and predicted, and thus, that demonstrated standard growth, the rule Level 3.)

      I’ve made a slight edit to my post, since I don’t know the exact cutoff for Level 3 to Level 4 to Level 5.

  2. meghank

    As far as “student perceptions” go, imagine giving a two-hour long, multiple choice survey to kindergartners who cannot read the answer choices (Yes, No, Maybe), much less the questions. Imagine the bubble they pick being incredibly important to the teacher you work with (they have you giving the survey to someone else’s students), despite the fact that when you say, “Color in the bubble by the answer you choose,” they (hear “color” and) say, “All right, ya’ll, get out the crayons!” (True story.)

    At the end of that day, I had an incredible headache, it is true, but mostly I reflected on the ludicrous waste of the children’s time. After that, the children (and their teacher) were in no state to do any learning for the rest of the day. Is this really the best use of children’s time?

    (We’re set to give the next two-hour-long survey within the next ten days or so. By the edict of the Gates Foundation, in all of its infinite wisdom.)

    • Wow! I thought the one for high school was bad, and it was “only” a waste of one period. I do remember last year the students’ having to do it twice because the data from the first one was thrown out due to a lack of modified surveys that time (I think for ELLs or students with special needs, but I can’t remember exactly).

Post a comment

About this Blog

erstwhile math teacher, current student


Subscribe to this blog (feed)


Archives