AI explainer

Why AI tutors hallucinate (and how RAG fixes it)

· 8 min read · By Riley Edds

You ask ChatGPT a question about your econ class. It gives you a confident, well-written answer — that contradicts the slide deck your professor showed yesterday.

This is hallucination. Most students experience it at least once, get burned, and either stop trusting AI or learn to verify everything. The verification habit is good. But it's worth understanding why AI does this, because the answer changes which AI tools you should trust for school.

It's not lying. It's predicting.

Here's the thing nobody quite explains in normal language: a large language model (LLM) doesn't actually have a concept of "true" or "false." It has a concept of likely.

When you ask ChatGPT "what was the holding in International Shoe Co. v. Washington?", it doesn't look up the case. It predicts what tokens (words) most likely come next given everything in its training data and the structure of your question. The training data includes thousands of legal opinions, casebooks, and law student summaries — so for famous cases, the prediction is usually right. For obscure cases, or for cases where competing summaries disagree, the prediction is sometimes confident-sounding nonsense.

The reason it sounds confident is that the model was trained on text written by humans, and humans tend to write confidently. "The holding in International Shoe was..." is a much more probable opening than "I'm not sure, but I think the holding might be..." So even when the model is uncertain, it talks like it isn't.

The short version: LLMs are trained to complete text plausibly, not to retrieve facts accurately. Hallucination isn't a bug — it's the same mechanism that makes the model good at writing essays, just applied to a question where the right answer requires a real-world lookup.

What kinds of hallucinations matter

Not every hallucination is equally bad. The dangerous ones for students:

The last one is the trap that hits most college students. The general internet consensus and your professor's framing aren't always the same. ChatGPT optimizes for the first; your exam grades you on the second.

RAG: making the AI look stuff up first

So how do you fix it? You don't make the model "smarter" — you change what it's allowed to answer from.

The technique is called retrieval-augmented generation, or RAG. It's a long name for a pretty simple idea:

  1. Before answering your question, the system retrieves relevant passages from a known source — your textbook, your lecture notes, a verified database, whatever.
  2. It feeds those passages to the LLM along with your question.
  3. The LLM is instructed to answer using only the retrieved passages, not its general training.

The result: the model can still write a coherent answer (that's what LLMs are good at), but the facts come from a source you control. If the source is good, the answer is good. If the source is missing the relevant info, a well-built RAG system will say "I don't know" instead of making something up.

This is why "ChatGPT but with my notes" tools are a different category from raw ChatGPT. The model's general training is doing the linguistic work; the retrieval step is doing the factual work.

What RAG doesn't fix

Two important caveats:

RAG can still hallucinate if the retrieval misses the right passage and the model fills in from memory. Good RAG systems have guardrails for this — they check if the retrieved context actually addresses the question, and refuse to answer if not — but the failure mode exists.

RAG can amplify bad sources. If your professor's slide is wrong about something, RAG will faithfully repeat that wrong thing. This is mostly a feature for college (you want answers that match what's on the exam, even if the exam is wrong), but worth knowing.

RAG doesn't make the LLM understand. The retrieved passage and the answer can superficially match without the model actually grasping the underlying concept. You'll see this when you ask a follow-up question that requires real reasoning — sometimes the answer falls apart.

How to spot a hallucination

Practical guide for any AI tool:

Why grounded tutors are better for college specifically

For school, you want the AI's facts to come from your specific course material, not the general internet. Three reasons:

  1. Your exam tests your professor's framing, not the textbook average.
  2. Course-specific context matters. "The relevant precedent" depends on which cases you've actually read this semester.
  3. Verification is faster. An answer linked to your slide deck takes one tap to check; an answer ChatGPT generated requires you to remember whether it matches what's in your notes.

This is the principle behind ClassMinds — the AI tutor is grounded in your uploaded class material, not a generic textbook. When you ask "explain this concept," the answer pulls from the lectures and slides you actually have. It's not a magical fix — RAG can still mess up — but the failure modes are much narrower than raw ChatGPT.

The takeaway

Hallucination isn't going away. It's structural, not a bug to be patched. Future LLMs will be less hallucination-prone (they already are than ChatGPT 3.5 was) but the failure mode will keep existing. The right adaptation:

The good news: the verification habit you build using AI in college transfers to the workforce. Every job that uses LLMs (most of them, soon) needs the same skill: trust but verify.


An AI tutor that's grounded in your real notes

ClassMinds uses RAG to keep the AI inside your actual class material. Free during beta.

Try the iOS beta →