Integrating with Canvas LMS: lessons from building a five-step grading wizard

One of the earliest design decisions in AEMS was that it should integrate with Canvas, not replace it. Examiners at KTH and most Swedish universities already use Canvas to manage courses, distribute assignments, collect submissions, and publish grades. Any tool that requires examiners to work outside that ecosystem creates friction that limits adoption.

This sounds straightforward in principle. In practice, Canvas integration was the most complex engineering effort in the entire project.

Why Canvas, specifically

Canvas by Instructure is the dominant learning management system in Swedish higher education. KTH, Stockholm University, Uppsala University, and many others use it as their primary platform for course administration. When a student submits an exam, it goes to Canvas. When a grade is published, it comes from Canvas. The grade book, the submission history, and the communication with students all live there.

A grading tool that does not connect to Canvas is a grading tool that requires manual data transfer at both ends: downloading submissions, then uploading grades. For 300 submissions, this is not merely inconvenient. It is a process that introduces transcription errors and discourages adoption.

The five-step wizard

The Canvas grading workflow in AEMS follows five steps:

Connect. The examiner provides their Canvas API token and institution URL. AEMS validates the connection and stores the credentials securely (per-user, encrypted).
Select course. AEMS fetches the examiner’s active courses from Canvas and presents them in a list. The examiner selects the relevant course.
Select assignment. Within the course, the examiner selects the assignment to grade, configures the AI provider and rubric, and sets grading parameters (vision model, marking model, annotation preferences).
Review. Before grading begins, the examiner can preview submissions alongside the reference solution. This step exists because examiners consistently told us they wanted to see what the AI would be working with before committing to a batch run.
Grade. The batch grading runs. Each submission is processed through the vision extraction and rubric application pipeline. Results appear as annotated PDFs with proposed marks. The examiner reviews, adjusts, and publishes grades back to Canvas.

Each step was straightforward to describe and difficult to implement.

API constraints and workarounds

The Canvas REST API is well-documented but has several practical limitations that affected the design.

Rate limiting. Canvas enforces a limit of approximately 3,000 API requests per hour per user. For a grading workflow that fetches submissions, downloads attachments, and posts grades, this limit is surprisingly easy to reach. AEMS batches requests in groups of 50, uses exponential backoff on rate limit responses, and caches API responses to avoid redundant calls.

Submission formats. Canvas supports multiple submission types: file uploads, online text, URLs, and media recordings. AEMS focuses on file uploads (specifically PDFs), which is the standard format for scanned exam submissions in STEM courses. However, students occasionally submit images, Word documents, or other formats. The system converts supported formats to PDF before processing and flags unsupported formats for manual handling.

Grade posting. Publishing grades to Canvas requires writing to the submission’s grade field via the API. This is a destructive operation: it overwrites whatever grade was previously recorded. AEMS includes a “training mode” that simulates the publish step without actually writing to Canvas. This allows examiners to verify the entire workflow before any student-visible changes are made.

SpeedGrader compatibility. Many examiners use Canvas SpeedGrader for manual review. AEMS-generated annotations must be compatible with SpeedGrader’s rendering, which imposes constraints on how PDF annotations are structured and attached to submissions.

The state management problem

The most underestimated engineering challenge was state management across the five steps. Each step depends on the results of the previous steps, and examiners do not always proceed linearly. They go back to change the rubric after previewing submissions. They restart the grading after adjusting a single check. They close the browser mid-workflow and return the next day.

The workflow state (selected course, assignment, rubric configuration, grading progress, individual submission results) must persist across sessions, survive interruptions, and remain consistent when the examiner navigates between steps non-linearly.

The initial implementation stored this state in the Flask session. This worked for simple cases but failed when the state grew large (rubric configurations with dozens of checks, grading results for hundreds of submissions) or when the examiner switched devices. The current implementation stores workflow state in the database, keyed by assignment ID, with version tracking to detect and resolve conflicts.

What examiners actually do

The design of the wizard was informed by observing how examiners actually use Canvas for grading, rather than how the documentation suggests they should use it.

Examiners grade in batches, not all at once. A typical pattern is to grade 30 to 50 submissions in a session, then stop and continue later. The workflow must support partial completion and resumption without data loss.

Examiners compare submissions. When uncertain about a mark, examiners frequently compare the current submission against others they have already graded. The review step in AEMS supports this by providing access to previously graded submissions alongside the current one.

Examiners change their minds. After grading 100 submissions, an examiner might realise that a rubric check is too strict or too lenient. The ability to adjust a check and re-run only the affected grading (without repeating vision extraction) is not a convenience feature. It is a requirement for practical use.

Examiners work late. A non-trivial proportion of grading happens in the evening, often under time pressure before a reporting deadline. Error messages must be clear, recovery from failures must be automatic, and the system must never lose work. This last point drove the decision to save grading state after every individual submission, not at the end of a batch.

The 198KB file

A confession about the engineering. The Canvas workflow module grew organically over several months of iteration. By the time it was feature-complete, the main workflow file was 198KB and contained over 5,000 lines of Python. It handled routing, form processing, API calls, state management, error handling, and progress tracking in a single file.

This was technically debt that I am still paying down. The current refactoring has extracted the workflow into 34 modular files organised by concern (grading, persistence, progress tracking, OCR, internationalisation). The legacy file still exists alongside the new structure during the migration. It serves as a reminder that “ship it and refactor later” is a valid strategy, but “later” always arrives.

Training mode

One feature that proved unexpectedly valuable is training mode. When enabled, the entire grading workflow runs exactly as it would in production, but the final step (publishing grades to Canvas) is simulated rather than executed. The examiner sees a complete preview of what would be posted, including annotated PDFs, proposed marks, and feedback text.

Training mode serves two purposes. First, it allows new users to learn the system without risk. Second, it allows experienced users to validate a rubric against a full batch of submissions before committing to the results. Several examiners have reported that their first training mode run revealed rubric problems they had not anticipated, which they corrected before the real grading pass.

The implementation cost was minimal (a single flag that bypasses the Canvas API write call), but the value for user confidence was substantial.