← All posts

Local-first architecture for student data

· Artem Kulachenko · 7 min read
privacyarchitecturegdprtechnicalsecurity

When I first showed the AEMS prototype to colleagues, the initial reaction was consistent. The tool looked useful. The accuracy was promising. And then they asked: “Where does the student data go?”

This question, repeated across every conversation with examiners, department heads, and IT security officers, shaped the fundamental architecture of the system. AEMS was not designed as a cloud service that later added a local option. It was designed as a local-first tool that can optionally connect to hosted infrastructure when the institution requires it.

The distinction matters more than it might appear.

What local-first means in practice

In a local-first architecture, the primary copy of the data lives on the user’s machine. That does not automatically mean every workflow is fully offline or that every byte never leaves the device. In the current AEMS Personal plan, local-first primarily means local source-file control rather than a zero-network workflow.

For AEMS Personal today, this means:

  • Source exam PDFs can be stored on the examiner’s computer via the paired AEMS Agent.
  • Vision extraction in Personal currently uses Ollama, either locally or via Ollama Cloud.
  • The hosted AEMS app handles sign-in, billing, subscription enforcement, and grading orchestration.
  • Grading jobs can fetch the required PDF from the paired local agent when the examiner starts grading in the hosted app.
  • Local control still matters because the primary source files can remain on the examiner’s machine instead of living permanently in shared hosted storage.

The Personal tier is therefore no longer a zero-account desktop-only mode. It is a hosted workflow with a local file bridge.

Why this matters for universities

European universities operate under GDPR, which imposes specific obligations on the processing of personal data. Student exam submissions are personal data. The exam content, the student’s identity, and the assigned grade are all subject to data protection requirements.

When a university adopts a cloud-based grading tool, it becomes a data controller that has engaged a data processor. This triggers a series of legal requirements: a signed Data Processing Agreement, a record of processing activities, a legal basis for the transfer, and (if the processing occurs outside the EU) additional safeguards under Chapter V of the GDPR.

When an examiner uses a fully local tool with their own API key or local model, the data processing relationship can be primarily between the examiner and the model provider. AEMS Personal today is narrower than that. The paired agent keeps the primary PDF copy on the examiner’s machine, but the hosted AEMS app still participates in account workflow and hosted grading can transiently process PDFs. The privacy gain is reduced central storage and tighter source-file control, not total vendor invisibility.

This is not a loophole. It is a design principle. The simplest way to reduce exposure is to avoid persistent central storage when it is not required.

The three deployment tiers

Not every use case is satisfied by local-only deployment. Departments that want centralised management, shared rubrics, and collaborative grading need a hosted component. Institutions that want full administrative control need an on-premises installation.

AEMS supports three deployment models, each with a different privacy profile:

Personal (local-first storage). The examiner uses a hosted AEMS account together with the open-source local agent. Source PDFs can remain on the examiner’s machine, while billing, authentication, and grading orchestration run through AEMS services. This tier is suitable for individual examiners who want local source-file control without running the full platform themselves.

Department (EU-hosted). AEMS provides a hosted service in EU data centres. Exam submissions are uploaded, processed, and stored under a defined retention schedule. A Data Processing Agreement governs the relationship. The university is the data controller; AEMS is the data processor. This tier is suitable for departments that want shared workflows, centralised rubric management, and administrative oversight.

Institutional (on-premises). AEMS is deployed inside the university’s own infrastructure. The university controls all data residency, network access, and AI provider configuration. This tier is suitable for institutions with strict data governance requirements or existing on-premises AI infrastructure.

The three tiers are not separate products. They are deployment configurations of the same codebase. An examiner who starts with the Personal tier and later moves to a Department deployment uses the same interface, the same rubric format, and the same grading pipeline. The difference is where the data lives and who is responsible for it.

Architectural consequences

Choosing local-first as the default had consequences that propagated through the entire system design.

Accounts still exist in the base tier. The current Personal deployment uses hosted account onboarding, password management, and subscription checks. The local-first property now applies to source-file storage, not to the entire account model.

Configuration as files, not database records. Rubrics, workflow state, and memory are stored as YAML and JSON files in a local configuration directory. This makes the system portable (the examiner can copy the configuration directory to another machine), inspectable (all state is human-readable), and version-controllable (the configuration directory can be a Git repository).

Provider-agnostic AI integration. Because the local deployment connects directly to the examiner’s chosen AI provider, AEMS must support multiple providers with a uniform interface. The provider abstraction layer supports Anthropic, OpenAI, Google, and Ollama, with a factory pattern that allows adding new providers without modifying the core grading logic.

Offline capability is narrower than before. A fully hosted account workflow means sign-in, subscription enforcement, and hosted grading orchestration require AEMS connectivity. Local review and local-model inference remain valuable, but the old “offline after activation” model no longer describes the current Personal architecture.

The trust model

The local-first architecture embodies a specific trust model: the examiner is trusted with their own students’ data. This may seem obvious, but it is worth stating explicitly, because many EdTech tools operate on a different trust model in which the vendor mediates access to student data and the examiner accesses it through the vendor’s platform.

In the AEMS model, the examiner already has the exam PDFs. They were already going to mark them manually. AEMS does not introduce new data access. It provides a tool that operates on data the examiner already possesses, using AI services the examiner already has access to.

This trust model aligns with how universities actually work. Examiners are responsible for their students’ grades. They have legitimate access to exam submissions. A tool that helps them process those submissions more efficiently does not need to centralise the data to provide value.

What we gave up

Local-first is not free. The architecture introduces limitations that a centralised system would not have.

No cross-examiner analytics. In a local deployment, each examiner’s grading data is isolated. There is no aggregate view of grading patterns across a department, no automated detection of calibration drift between examiners, and no centralised reporting. These features exist in the Department and Institutional tiers but are architecturally impossible in the Personal tier.

No automatic updates to shared rubrics. When one examiner improves a rubric, the improvement stays on their machine. Sharing rubrics between examiners requires manual export and import. Again, the hosted tiers address this, but the local tier does not.

Limited support infrastructure. When something goes wrong in a local deployment, AEMS support cannot access the examiner’s system to diagnose the problem. Diagnostic information must be collected and shared manually. This is a conscious trade-off: the same isolation that protects student data also limits the vendor’s ability to provide support.

These are real limitations. They are also the limitations that many university data protection officers prefer, because in the Institutional on-premises tier the vendor genuinely cannot access student data, even in a support scenario.