Particularly in various comment threads, there's a very basic division on the nature of the problem of teacher evaluation. On one hand, you have people who regard it as a singularly difficult, essentially unsolved technical problem of pivotal importance. On the other hand, you have people who regard teacher evaluation as a moderately difficult management issue, not that much different than personnel evaluation in other complex fields. In particular, these folks can point both to personal experience ("Wasn't there broad agreement in your high school on who the best and worst teachers were?") and to the many excellent schools (and other enterprises) in the world that don't require extraordinary quantitative measures to evaluate their staff.
The difference, which mostly goes unspoken is that the people who see this as a uniquely hard problem want to compare and teachers across a wide range of schools. This is an unsolved problem, even when measures of success are limited to test scores:
The idea behind value-added measurements is that they look instead at how much growth students make in a year. Teachers are rewarded not when their students score highest, but when the students’ performance gains exceed the average gains made by similar students.
So while the ratings were explicitly designed to compare teachers who work with similar students, they cannot compare teachers who don’t. “This is just a difficult question that we still don’t know how to answer — this question of how to compare teachers who are in very different kinds of schools,” said Douglas Staiger, a Dartmouth College economist.
He added, “There are a lot of issues that I disagree with critics of value-added. But this is a real issue that it’s not clear how best to handle.”
I don't think that problem needs to be solved, or that it is worth the time, expense, and distraction to try to solve it.