The psychology of grading, and why a tireless companion helps

Grading requires concentration. Not the kind where you listen to a lecture while checking email, but the sustained, evaluative kind where you hold a rubric in your head, read a student’s work, and make a judgment that will affect their grade. It is cognitively expensive, and it does not tolerate interruptions.

This is a problem, because an academic’s working day is one of the most fragmented schedules imaginable.

The fragmented day

A typical day involves teaching, supervision meetings, committee work, email, research discussions, administrative tasks, and the occasional moment to think. None of these activities last longer than an hour before something else demands attention. The schedule is not designed for deep work. It is designed for availability.

Grading does not fit into this structure. You cannot meaningfully grade three exams between meetings, then pick up where you left off after a two-hour teaching block. Every interruption costs recalibration time: re-reading the rubric, re-establishing the standard, finding your place in the stack. The startup cost is high enough that most academics do not even try to grade during the workday.

So it drifts. Into evenings. Into weekends. Into the stretches of uninterrupted time that were supposed to be for research, or family, or rest. This is not a scheduling problem. It is a structural incompatibility between what grading demands and what academic life provides.

If you have graded exams, you know how frustrating this is.

The baseline problem

Even when you find uninterrupted time, there is another difficulty. When you start grading a new exam, you do not have a clear baseline.

The rubric tells you what to check. It does not tell you how strictly to apply each check. How much partial credit for a correct method with an arithmetic error? How many marks for a free body diagram that is missing one force but otherwise correct? What about a student who uses an unconventional but valid approach?

You develop the baseline as you grade. The first ten papers are slow, because you are deciding not just whether each answer is correct, but what your standard actually is. By paper twenty, you have a mental model. By paper fifty, the standard feels stable.

But it is not. It shifts. Research on grading consistency shows that evaluative standards drift over sustained marking sessions. The threshold for “acceptable” partial credit at 9 PM is not the same as it was at 2 PM. You become either more lenient (because you are tired and want to finish) or more strict (because the last five papers were weak and recalibrated your sense of “average”). Neither drift is conscious. Both are measurable.

The self-awareness trap

Here is the part that does not appear in the research literature but every examiner knows.

We are aware of these weaknesses. We know we drift. We know fatigue makes us less reliable. We know that our mood, the time of day, and the quality of the previous paper all influence the current judgment. This self-awareness should help, but it often makes things worse.

It creates a specific kind of anxiety: the fear that the grades you are about to release are not fair. Not because you did not try, but because you know, from direct experience, that the process is imperfect. You have marked the same paper twice on different days and gotten different results. You have caught yourself being too generous on a weak paper because the previous three were worse. You have found yourself being harsh on a decent paper because you just graded an excellent one.

The response to this anxiety varies. Some examiners become overly cautious, re-reading and second-guessing until the process takes twice as long. Others develop a tendency to be generous, reasoning that if the standard is uncertain, it is better to err in the student’s favour. Neither response produces better grades. Both are coping mechanisms for a process that asks for precision while providing conditions that undermine it.

The tireless companion

This is where AI-assisted grading changes things, and not in the way I expected.

I expected the main benefit to be speed. It is not. The main benefit is that the AI does not drift.

When you have a grading companion that applies the same rubric with identical rigour to every submission, something shifts psychologically. The baseline problem disappears. Paper number 250 is assessed against exactly the same standard as paper number 1. Not approximately. Exactly. The AI does not get tired. It does not get frustrated by a sequence of weak papers. It does not start cutting corners at 10 PM because it wants to be done.

This consistency frees the examiner to do something more interesting than mechanical checking: to exercise judgment. Instead of spending cognitive energy on “am I still applying the rubric the same way I was two hours ago?”, you spend it on the cases where judgment actually matters. The ambiguous answers. The creative approaches. The edge cases that a rubric cannot fully capture.

The anxiety about fairness decreases too. Not because the AI is perfect (it is not) but because the baseline is stable. Your corrections are additive: you are refining a consistent first pass, not trying to maintain consistency yourself across hundreds of papers while fighting fatigue. The reviewing role is less exhausting than the grading role.

The domain expertise surprise

There is another effect I did not anticipate.

AI models are extraordinarily capable at many tasks. But in your specific domain, the one you have spent twenty years teaching and researching, they are not better than you. They are not even close.

This sounds like a limitation. It is actually what makes the process rewarding.

When you grade with an AI companion, you constantly notice things the AI missed. A student’s unusual notation that the AI flagged as an error but is actually correct in the context of the problem. A derivation that arrives at the right answer through a method that the rubric did not anticipate. A subtle conceptual misunderstanding that the AI marked as correct because the final numerical result happened to be right.

You catch these things because you are overfitted to your domain, and in this context, being overfitted is exactly right. You have the intuition that comes from years of teaching the same material, seeing thousands of student approaches, and knowing where the common misconceptions hide. The AI does not have this. It has broad competence. You have deep, domain-specific expertise.

Grading with AI makes you aware of this expertise in a way that grading alone does not. When you mark papers by yourself, your knowledge is invisible to you; it is just “grading.” When you review an AI’s proposals and catch errors that a very capable model missed, your expertise becomes visible. You see where your judgment adds value. That is satisfying in a way that marking paper number 200 at 11 PM never is.

It would never replace teaching

Some colleagues worry that AI grading tools are the first step toward removing academics from assessment entirely. I do not share this concern, and I think the worry comes from imagining AI marking in isolation rather than experiencing it.

Grading is not separate from teaching. The patterns you notice while marking are teaching insights: which concepts students struggle with, which problem formulations cause confusion, where the course materials need improvement. AEMS surfaces these patterns through deep analytics, showing you exactly where the cohort struggled and by how much. But analytics are a starting point, not an answer. It is the teacher who decides what to do with that information: which lecture to restructure, which exercise to replace, which concept to revisit with a different approach. Making course activities relevant and impactful enough to address the discovered problems requires the same domain expertise that makes your corrections valuable.

AI-assisted marking does not remove the examiner from the process. It removes the mechanical repetition and the consistency burden, and replaces them with a review process that is, unexpectedly, more intellectually engaging than marking from scratch. You spend less time on the parts that drain you and more on the parts that remind you why you teach.

The evenings and weekends are still there. You just get to spend them differently.