The quiet rise in grade challenges, and what it asks of us

I have noticed a pattern over the past few years that other academics will recognise. The number of students who formally challenge a mark after results are published has grown by a factor of three to four. Not a modest increase. Three to four times the volume, sustained across cohorts. The trend is not specific to a single course. It shows up across the assessments I am involved in, and colleagues describe the same thing in their own teaching.

I want to look at why this is happening, what it costs us, and what it points to.

A challenge with no friction

The mechanism is now electronic. A student logs into the portal, opens the assessment, clicks a button, types a short paragraph, submits. The total effort is a few minutes. The procedural cost has collapsed.

The risk has collapsed with it. The original mark stands as a floor. The reviewed mark is either the same or higher. There is no scenario in which a challenge leaves the student worse off than before.

In the absence of any downside, and with the entry barrier reduced to a single form, the rational behaviour for any student who has the slightest doubt about a mark is to submit a challenge. It would be irrational not to. The phenomenon we are observing is simply what happens when a process that was previously gated by social cost (going to the examiner in person) and procedural effort (writing a letter, finding the right office) becomes a one-click action with a guaranteed non-negative outcome.

AI as a drafting partner

A second factor has compounded the procedural shift. Writing a coherent appeal used to require some effort. The student had to articulate a specific objection, frame it against the rubric, and produce a paragraph that would not embarrass them in front of the examiner. That writing cost has now also collapsed. A student pastes the question, their answer, and the examiner’s annotation into a general-purpose language model and asks it to draft an appeal. The model produces a confident, fluent paragraph in seconds.

The language model does not have the rubric. It does not have the worked solution that the examiner used as reference. It does not have the marking principles that govern partial credit on this particular question, in this particular course, under the conventions of this particular department. What it does have is a strong prior toward defending the student, because that is the role it has been asked to play. It composes an appeal that reads well. It frequently misinterprets what was actually marked, because it is reasoning about a judgement whose justification it cannot see.

The examiner therefore receives a higher volume of appeals, and a substantial fraction of them are arguments against a grading decision that the appeal itself has misunderstood. The student is not at fault here. They have used the tool that was available, in the way the tool encourages. The structural issue is the same as before. The explanation that would have anchored the appeal in something concrete was never written down on the paper, so neither the student nor the assistant they consulted had anything to engage with.

The size of the shift

A three- to fourfold increase is large enough that it changes the nature of the post-exam workload, not just the volume. It is not a slow drift. It is a regime change in how students relate to the result.

I am not going to name courses here. The pattern is general, and it has been general for long enough that I am confident the cause is structural rather than local.

What it actually costs

For each challenge, the examiner has to reread the paper, reread the rubric, and draft a written response that explains the original judgement in enough detail to satisfy a sceptical reader. In many institutions the response also has to satisfy a second reviewer, which means it must stand up as a piece of formal reasoning and not as an offhand note.

Two consequences follow.

First, the examiner is now writing the explanation that should have been in the original annotation. After the fact. Under time pressure. In a defensive register, because the conversation is no longer pedagogical. It is adversarial.

Second, the examiner is pulled into a dispute posture with the student. The relationship that should be between the student, their own work, and the rubric has been replaced by a relationship between the student and the examiner, mediated by an administrative procedure. That is corrosive to the teaching relationship, and it is exhausting to be in repeatedly. A teacher who has just defended fifteen marks in writing is not in the best frame of mind for the next lecture.

Where the conclusion points

The remedy is not to reintroduce friction into the challenge process. That would be a regression in transparency, and it is not what students or institutions want. Easy challenges are, on the merits, a good thing. The problem is not that students are using the procedure. The problem is that the procedure is being used to obtain explanations that should have been delivered with the original mark.

The remedy is to push the explanation upstream. Every annotation on the corrected exam should carry enough reasoning that the student, reading it cold, understands what was marked wrong, why it was wrong, and what the rubric expected. An annotation that says “incorrect” invites a challenge. An annotation that says “the boundary condition was applied at the wrong end, which is why the sign of the resulting deflection is inverted” does not, at least not the same kind of challenge.

This is exactly the kind of densification of feedback that is hard to produce manually under time pressure. A human examiner facing 250 papers cannot afford to write a paragraph next to every check mark. The economics do not work. I have argued elsewhere that this is the missing half of exam feedback, and the same logic applies in reverse for criticism. A correction without a reason is structurally weaker than a correction with one.

When an AI assistant carries the cost of writing the per-step explanation, the economics change. The examiner reviews and corrects rather than producing from scratch. The annotation that explains a deduction in a sentence becomes feasible, not aspirational.

There is a secondary effect here, given the role AI now plays on the student’s side as well. If the reasoning behind the mark is written next to the place it was applied, the assistant that the student consults has something concrete to read. The appeal it drafts can engage with what was actually marked, instead of inventing a guess at the examiner’s intent. Both sides of the conversation then refer to the same documented judgement, which is the only condition under which the exchange can be substantive.

What the trend is really telling us

Three to four times the volume of challenges is a measurement. It tells us that the explanation we used to give informally, in person, or implicitly through the weight of a red pen, is no longer being given anywhere. The challenge form has become the medium of that conversation, after the fact and in adversarial form.

Better annotations in the corrected work shift the conversation back to where it belongs. The student reads the explanation when they read the mark. If they still disagree, the challenge is a substantive question about a documented judgement, not a request to be told what the judgement was in the first place. That is a healthier exchange for everyone involved.