The teacher's context window
Every teacher knows the moment. The last exam is graded, the results are submitted, and you sit back with a head full of observations. Question three was poorly worded. The students who attended the lab sessions performed noticeably better on problem five. You should restructure the lecture on superposition because half the class used an approach that suggests they did not understand when it applies.
These observations are vivid. They are specific. They are the raw material for making next year’s course better.
And they will be gone within a month.
The overwrite
When one course ends, the next one is already starting. The Finite Elements for Engineers exam lands in March, and the dynamics course begins the same month. There is no gap. Different students, different material, different concerns. The cognitive space that held a term’s worth of grading patterns is immediately occupied by which textbook chapter to assign, how to restructure the computer lab, whether the new TA has enough background.
The finite element insights do not disappear all at once. They fade. By June, you remember the general shape: “question three was a problem.” By September, you remember that you wanted to change something but not exactly what. By January, when it is time to prepare the next round, you are working from the rubric and your lecture notes, not from the observations you had ten months earlier.
This is not a failure of diligence. It is what happens when ten months of competing demands sit between the observation and the opportunity to act on it. We have, in effect, a context window, and multiple courses stacked back to back is a lot to ask it to hold.
The promise we make to ourselves
The standard response is familiar to every academic: “I will write it down.”
Some do. A paragraph in a notebook, a few bullet points in a file, an email to yourself with the subject line “changes for next year.” These notes are better than nothing, but they share a weakness: they are written at the worst possible moment.
After a week of grading, you are tired of looking at exams. The notes reflect whatever frustrations are still fresh. “Question three needs rewording” makes the list. “Students who applied the principle of virtual work were more successful than those who used direct equilibrium, possibly because the lecture example used virtual work and they mimicked the method without understanding the alternative” does not. It is too nuanced, and you do not have the patience to articulate it at that point. You want to close the laptop and not think about equilibrium for a while.
The notes capture what was wrong. They rarely capture what was right, or why. They are a to-do list, not an analysis.
What gets lost
The subtler insights are the ones that matter most for course improvement, and they are the first to disappear.
A grade distribution tells you that 30 percent of students failed question four. It does not tell you that most of them applied the correct method but lost their way at the same step, which suggests the lecture covered the concept but not the technique for implementing it. It does not tell you that the top-performing students disproportionately used a specific approach, which suggests that approach should be taught explicitly rather than left as one option among many.
These patterns are visible during grading, when you are reading paper after paper and the repetitions become unmistakable. They are invisible afterwards, because they live in the examiner’s working memory, not in any permanent record. The grades go into the system. The patterns go home with the examiner and slowly evaporate.
The generational problem
Courses outlive their creators. A professor retires or moves to a different university, and a new teacher inherits the course. What do they inherit? The syllabus, the textbook list, the previous year’s exam, and perhaps a few sentences of handover notes.
The accumulated judgment of a decade of teaching is gone. Which questions reliably discriminate between students who understand and those who have memorised. Which problem formulations cause confusion that looks like incompetence but is actually ambiguity. Which topics need more lecture time than the syllabus allocates. All of this walks out the door with the person.
The new teacher starts from scratch. They will rediscover the same pitfalls, make the same adjustments, arrive at the same conclusions their predecessor reached years ago. Academic knowledge transfer is curiously asymmetric: we publish every research finding, but teaching insights stay locked in individual heads until those heads move on.
What happens when grading leaves a trail
When grading produces detailed annotations, as I discuss in the context of feedback quality, something useful falls out as a side effect. The observations that would normally evaporate are captured because the grading process itself recorded them. No separate analysis step required. No post-mortem written at midnight when you would rather be doing anything else.
The examiner who returns to the mechanics course in January can review what actually happened last March. Not a faded memory, not a to-do list written in a hurry, but a record of what each question produced: which error types appeared, where the cohort performed well, how this year compared to the year before.
The interesting part is trend detection. If the same type of error appears in a third of submissions three years running, that is not a question problem. That is a curriculum problem. The lecture on that topic is not landing, and it has not been landing for three years. The examiner who sees this can make a structural change. The examiner who relies on memory sees each year’s errors as isolated events, because the context from previous years has been overwritten.
When improvement actually happens
Course improvement, in practice, happens in one of two unsatisfying ways. Either in a rushed post-mortem when the examiner is tired and the next course is pressing, or in a vague pre-term review when the examiner has time but has lost the specifics. The first sacrifices quality of thought. The second sacrifices quality of information.
Persistent grading data lets you move the analysis to the moment when it is most useful: before the next course round begins, with the specifics intact. Before writing the exam, you see which questions worked and which did not. Before revising the lecture plan, you see which concepts the cohort actually struggled with, not which concepts you vaguely remember being a problem.
The new teacher inheriting the course gets the same advantage. Instead of three years of rediscovering what their predecessor already knew, they read it.
The observations were always valuable. They just never had anywhere to live between one course round and the next. Now they do.