Validity through Accessibility
Kelly Webb-Davies · Lead Education AI Consultant · University of Oxford AI Competency Centre
Explore the Model ↓Written assessments have long used academic language as a proxy for thinking. But what happens when that proxy breaks?
We want to assess thinking — the ability to understand, reason, argue, evaluate, and communicate. But we can't view thinking directly. We must infer it from language, and we've historically required that language to be "prestige academic English."
Generative AI has broken the link between thinking and writing. When a machine can produce polished text, a submitted essay no longer provides dependable evidence that the student did the thinking behind it. This is fundamentally a language problem, not a cheating problem.
AI detection tools are unreliable and discriminatory — they disproportionately flag non-native speakers. Honour declarations can't verify actual authorship. The "messy middle" of permitted-vs-unpermitted AI use is impossible to police. We need a structural solution, not a surveillance one.
VFWA separates ideation (thinking) from expression (writing) by splitting every assessment into two connected stages. Together, they create a closed loop of verifiable evidence.
Students demonstrate their understanding in a secure, observed environment using an unseen prompt — in whatever language or modality works best for them.
Students refine their Stage 1 ideas into a formal written product at home, with unrestricted access to AI tools. The focus shifts to how they transform and justify their choices.
Stage 1 happens in a supervised, secure environment — like an exam hall or observed classroom session. Students respond to a previously unseen prompt that ensures they are thinking in real time, not reproducing pre-prepared AI output.
Crucially, this stage is ungraded. By removing marks from the initial expression, students feel safe to be creative, imperfect, and genuine — creating what researchers call epistemic safety.
Students complete Stage 1 under supervision — in a classroom, exam hall, or proctored online setting — ensuring the work is authentically theirs.
The stimulus is revealed only at the start of the session. This prevents pre-prepared generative AI output and ensures real-time reasoning.
Students choose how to express their ideas — handwritten notes, argument maps, voice recordings, or any combination. They use whichever language or modality best captures their thinking.
Stage 1 is not marked. This removes the pressure to perform in "perfect" academic English and encourages authentic, unfiltered thinking.
We don't assess the language — we assess the thinking. Stage 1 gives us a window into what the student truly knows, before any tool has the chance to polish or replace it.
In Stage 2, students take their Stage 1 "anchor" home and develop it into a formal written product — an essay, report, memo, or any discipline-appropriate artefact.
AI tools are not just permitted — they are encouraged. Students can use generative AI for language refinement, structural editing, accessibility support, and cognitive scaffolding. What matters is their evaluative judgement: how they select, adapt, and justify the transformations they make.
Students complete this stage in their own time and space, with access to all their usual tools, resources, and support networks.
Rather than banning AI tools, VFWA legitimises their use for refinement — just as we legitimise spellcheckers, grammar tools, and translation software.
The grade focuses on how the student transforms their raw ideas into a polished product — the decisions they make, the sources they integrate, and the justifications they provide.
The final submission must be traceable back to Stage 1. This evidentiary link is what makes the whole model verifiable and robust.
The question is no longer "Did the student write this?" — it becomes "Does the student own the thinking behind this, and can they justify the choices they made?"
A conditional viva is not punitive. It is activated only when:
The viva gives the student a chance to verbally demonstrate their understanding and explain the transformations they made — it is an opportunity, not an accusation.
💡 The viva replaces unreliable AI-detection tools with a human, dialogic process. It centres understanding rather than surveillance.
Every feature of VFWA exists for a reason. This table maps each design choice to its underlying rationale — so you can explain the why to colleagues and students.
| Feature | Stage | What It Does | Why It Matters |
|---|---|---|---|
| Unseen Anchor | 1 | Stimulus is revealed only at the start of the session | Prevents pre-prepared AI output; ensures real-time reasoning is captured |
| Observed Setting | 1 | Supervised environment (classroom, exam hall, or proctored online) | Establishes authorship and authenticity without relying on AI detection |
| Ungraded | 1 | Stage 1 carries no marks | Creates epistemic safety — students focus on thinking without fear of linguistic penalty |
| Choice of Modality | 1 | Students may speak, write, draw, or mix modalities | Removes language as a barrier; embraces neurodivergent and multilingual expression |
| AI Unrestricted | 2 | All digital tools, including generative AI, are permitted | Eliminates policing; reflects real-world professional practices; supports accessibility |
| Evaluative Judgement | 2 | Assessment focuses on transformation decisions and justifications | Shifts assessment from product reproduction to critical thinking skills |
| Evidentiary Link | 1 → 2 | Stage 2 must be traceable to Stage 1 ideas | Creates a "closed loop" of verifiable evidence; replaces detection with design |
| Conditional Viva | Safeguard | Oral dialogue triggered by weak link between stages | Provides a human, non-punitive verification process; replaces AI detection |
The model is adaptable to any discipline. Here are three operational examples showing how Stage 1 and Stage 2 connect in practice.
Students receive an unseen fact scenario and write a handwritten analysis identifying the key legal issues, relevant rights, and initial arguments. They may use bullet points, diagrams, or structured outlines.
Students expand their analysis into a 1,200-word legal memorandum. They use AI to refine legal language, check citation formatting, and improve structural coherence — while maintaining the arguments anchored in Stage 1.
Students receive an unseen dataset and record their initial interpretation: what the data shows, potential explanations, and any anomalies. They may annotate graphs or record a voice explanation.
Students develop a formal results-and-discussion section, integrating literature and refining scientific English. AI tools help with language accuracy and referencing — but the interpretation must stem from Stage 1.
Students read an unseen passage and produce an initial close reading — annotating literary techniques, themes, and their personal response. They may speak their analysis aloud or handwrite notes.
Students craft a formal critical essay building on their Stage 1 observations. They use AI for language polishing and secondary source integration, while their original analytical angles anchor the final argument.
VFWA reframes accessibility as a condition of validity. When barriers prevent students from demonstrating what they truly know, the assessment itself becomes invalid — not the student.
For many neurodivergent students — including those with ADHD and dyslexia — speaking keeps up with the speed of thought, while writing slows it down and blocks ideation. VFWA lets students choose the modality that captures their thinking most faithfully.
Students who think in multiple languages shouldn't be penalised for "wrong" English. Stage 1 invites translanguaging — using all linguistic resources — so that great ideas aren't lost behind a language barrier.
By legitimising AI tools in Stage 2, VFWA provides built-in cognitive support — spell-checking, grammar assistance, structural scaffolding — that traditionally required expensive accommodations or formal diagnoses.