William Tozier at National Slurry has a lovely proposal for ranking students given their grades on multiple tasks, each testing a different skill.
Suppose for the sake of my pretty pictures that you’ve given your students two big graded assignments: a 40-page research paper that they had four weeks to complete, which was graded on comprehensiveness, insight, and clarity [but given one grade — oops]; and a final exam which they took in a cold winter lecture hall in an unfamiliar part of campus, which focused for two harried hours on their mastery of domain knowledge. Of course these two [or four] tasks demand quite different skills, and occurred on different days, and under different stress levels and social conditions. But you’re a grader, see, and you make grades. It’s what you’re for. So here are their grades:December 22, and grades are due. And your grant proposal is overdue! Shit. Who gets an A, and who gets an F? You know the answer, of course: You add the points up, and look at quantiles of some sort. Duh.
But hey, some instructors like to game this a bit. They sense that a point on the big paper is worth more than a point on the final, so they weight that project a bit more heavily. The resulting final score is, say (2*g1 + g2). But wait! In the physiology and organic chemistry classes I suffered through as a wee child, the “words” part was entirely disregarded, and the point of the class was how much you recalled. I’d say that final grade was more like (g1 + 9*g2).
Hmmm. Oh, man, you’re never gonna get that proposal done at this rate. So different students apply different levels of ability to the tasks, which are themselves noticeably different (look at the variance). And different instructors apply different affine combinations of these multiple grades to come up with a single score? And don’t forget that the University is averaging those scores, and handing out millions of dollars.
(0.67g1 + 0.33g2) and (0.11g1 + 0.99g2) give very different rank-orderings. Let’s look at how the students in our example class fare over the range of all possible such affine combinations.
Any computational geometer worth his weight in paperclips will instantly recognize the relationship between the first and second figure. Tozier is simply applying a projective duality map, transforming each grade point (a,b) in the first figure (the primal plane) to the line y=a(1-x)+bx in the second figure (the dual plane). We can also think of the duality map as going the other way, mapping any line y=αx+β in the first figure to the point (β, α+β) in the dual plane. This duality map preserves incidences: if a point p lies on a line L in the primal plane, the dual line p* passes through the dual point L* in the dual plane.
The most common duality map used by computational geometers, which maps (a,b) to y=ax-b, has some additional useful properties. First, it's an involution: the dual of the dual of a point is the original point. Second, it preserves vertical distances: if a point P is distance d above a line L, then the dual point L* is distance d above the dual line P*. (Some people argue that this is actually reveresing the vertical distance. Whatever. In both pictures, the point is above the line.)
Why is this useful? Because it lets us take advantage of the Fundamental Theorem of Projective Geometry:
For any statement about points and lines (respectively), there is an equivalent statement about lines and points (respectively), which can obtained by purely mechanical translation.
Any duality map gives us an explicit translation. This is an incredibly powerful observation, but it's quite easy to prove if we remember that projective geometry is really just linear algebra one dimension higher. In other words, we can rephrase the Fundamental theorem like this:
For any theorem about vectors and covectors (respectively), there is an equivalent theorem about covectors and vectors (respectively), which can obtained by purely mechanical translation.
Felix Klein, the fater of modern geometery, would probably object to the linear algebra analogy. He preferred to think of geometry as a bunch of undefined things, called "points" and "lines" purely for the same of intuition, which obey several axioms. Theorems are just logical consequences of those axioms, or "necessary truths". Taking this viewpoint, Klein would have been just has happy writing the theorem like this:
For any theorem about tables and chairs (respectively), there is an equivalent theorem about thunderstorms and dancing bears (respectively), which can obtained by purely mechanical translation.
In other words, call them whatever you like. As long as they obey the same axioms, it's the same stuff.
Anyway, this theorem implies that for purposes of deriving and analyzing algorithms, we can choose whichever representation is most convenient—points, lines, vectors, linear functionals, or dancing bears—without changing the actual logic of the algorithm. You can think of the pair of numbers (a,b) as a pair of grades, as a point in the plane, as the coeffficients of a line, as the first two coordinates of a 3-dimensional vector (a,b,1), or as the height and weight of a dancing bear. A single algorithm can be interpreted as acting on any of these objects. In fact, it's often quite useful to switch from one interpretation to another in the middle of the algorithm. I do not mean that the algorithm actually does something to make this switch happen; rather, the user of the algorithm interprets exactly the same data in two different ways at two different stages of execution.
Alas, having drawn us into the wonderful world of duality, Tozier switches goemetric gears, suggesting that we rank students based on the number of other students whose grades strictly dominate theirs. Under this scheme, students whose grades are not strictly dominated by anyone else's have the highest rank, and lower ranks are assigned recursively to everyone else. Computing ranks by this scheme is equivalent to computing the staircase layers of the original points. Here are the first three staircase layers of Tozier's grades; the outermost staircase shows the students with the highest ranks.
Given a set of n points in the plane (in general position, amen), it is not hard to compute the staircase layers in O(n log n) time. I won't spoil the fun by actually telling you the algorithm, but I'll give you a big hint. (Always beware of hints from professors. They're trying to teach you something!) Let's define a partial order over the points: (a,b) dominates (c,d) if and only if a>c and b>d. A chain is a subset of the points that can be ordered by domination, where each point dominates all its predecessors. An antichain is a subset of points in which no point dominates any other. Each staircase layer is an antichain. The height of the set is the length of the longest chain. Dilworth's theorem—or strictly speaking, its dual—implies that the number of staircase layers is equal to the length of the longest chain!
Tozier's grading scheme is awfully generous. A student who aced the first task could completely blow off the second, completely confident of a high grade at the end of the course. If I were going to do something like this, I'd be tempted to do it the other way around: rank the students based on how many other students they strictly dominate. Under this scheme, students whose grades do not strictly dominate anyone else's have the lowest rank, and higher ranks are assigned recursively to everyone else. Under this entirely symmetrical—dare I say dual—system, a student who bombed the first task has almost no chance of passing the course, no matter how well they did on the second. It's the same staircase layers, just from the lower left instead of the upper right.
After all, we shouldn't have to choose whether proof-writing is more or less important than programming; whether problem solving is more or less important than comprehension; whether logic is more or less important than rhetoric; whether teaching is more or less important than research. We should expect our students to be good at everything!
</irony>
"Alas", indeed.
As it happens, I hedged a bit when I suggested that students' grades on disparate tasks will be un- or anticorrelated. I'm sure there are plenty of cases in which the traditional ranking is very close to the Pareto ranking.
That said, you also missed a great pedagogic hook I left dangling: the dangers of "sphere hardening" (at least that's what we called it in machine learning classes), which I suppose is something about the distribution of density in high-dimensional blobs. _Viz_ in one dimension, there's one best score and the rest tend to be spread evenly below it; in two dimensions, there will be substantially more nondominated points, and more closer to the surface; in three, even more "bests" and tighter clumping.... At some (early) point in the progression the likelihood is that almost everybody is nondominated or close to it.
Thus: Do more work, and you might all get As.
Besides -- I left the instructor the out of choosing the tasks, didn't I? ;)
You may find something of related, if meager, geometric interest at
Fragments & Exercises
[http://fragments-and-exercises.blogspot.com/2005/02/before-you-die-of-tetanus.html]
I'm hoping to post another entry weekly for... well, I tend to write one ever two or three days, and have been doing so for 20 years, so a very long time.
Posted by: Bill Tozier | February 15, 2005 at 06:14 AM
Bloody blogger
http://fragments-and-exercises.blogspot.com/2005/02/before-you-die-of-tetanus.html
Posted by: Bill Tozier | February 15, 2005 at 09:08 AM