Why AI tutors hallucinate (and how RAG fixes it)

You ask ChatGPT a question about your econ class. It gives you a confident, well-written answer — that contradicts the slide deck your professor showed yesterday.

This is hallucination. Most students experience it at least once, get burned, and either stop trusting AI or learn to verify everything. The verification habit is good. But it's worth understanding why AI does this, because the answer changes which AI tools you should trust for school.

It's not lying. It's predicting.

Here's the thing nobody quite explains in normal language: a large language model (LLM) doesn't actually have a concept of "true" or "false." It has a concept of likely.

When you ask ChatGPT "what was the holding in International Shoe Co. v. Washington?", it doesn't look up the case. It predicts what tokens (words) most likely come next given everything in its training data and the structure of your question. The training data includes thousands of legal opinions, casebooks, and law student summaries — so for famous cases, the prediction is usually right. For obscure cases, or for cases where competing summaries disagree, the prediction is sometimes confident-sounding nonsense.

The reason it sounds confident is that the model was trained on text written by humans, and humans tend to write confidently. "The holding in International Shoe was..." is a much more probable opening than "I'm not sure, but I think the holding might be..." So even when the model is uncertain, it talks like it isn't.

The short version: LLMs are trained to complete text plausibly, not to retrieve facts accurately. Hallucination isn't a bug — it's the same mechanism that makes the model good at writing essays, just applied to a question where the right answer requires a real-world lookup.

What kinds of hallucinations matter

Not every hallucination is equally bad. The dangerous ones for students:

Invented citations. The model says "see Smith et al., 2019" — and the paper doesn't exist. This is the worst kind because it looks legitimate.
Plausible-but-wrong details. "The Krebs cycle has 9 steps" (it has 8). "Newton published the Principia in 1687" (true). Mixed truth and fiction in the same paragraph.
Outdated facts. "The current CEO of X is Y" — was true two years ago, isn't anymore. Models have training cutoffs.
Course-specific contradictions. Your professor teaches a particular framework; ChatGPT teaches the textbook version, which is different. You write the textbook version on the exam, lose points.

The last one is the trap that hits most college students. The general internet consensus and your professor's framing aren't always the same. ChatGPT optimizes for the first; your exam grades you on the second.

RAG: making the AI look stuff up first

So how do you fix it? You don't make the model "smarter" — you change what it's allowed to answer from.

The technique is called retrieval-augmented generation, or RAG. It's a long name for a pretty simple idea:

Before answering your question, the system retrieves relevant passages from a known source — your textbook, your lecture notes, a verified database, whatever.
It feeds those passages to the LLM along with your question.
The LLM is instructed to answer using only the retrieved passages, not its general training.

The result: the model can still write a coherent answer (that's what LLMs are good at), but the facts come from a source you control. If the source is good, the answer is good. If the source is missing the relevant info, a well-built RAG system will say "I don't know" instead of making something up.

This is why "ChatGPT but with my notes" tools are a different category from raw ChatGPT. The model's general training is doing the linguistic work; the retrieval step is doing the factual work.

What RAG doesn't fix

Two important caveats:

RAG can still hallucinate if the retrieval misses the right passage and the model fills in from memory. Good RAG systems have guardrails for this — they check if the retrieved context actually addresses the question, and refuse to answer if not — but the failure mode exists.

RAG can amplify bad sources. If your professor's slide is wrong about something, RAG will faithfully repeat that wrong thing. This is mostly a feature for college (you want answers that match what's on the exam, even if the exam is wrong), but worth knowing.

RAG doesn't make the LLM understand. The retrieved passage and the answer can superficially match without the model actually grasping the underlying concept. You'll see this when you ask a follow-up question that requires real reasoning — sometimes the answer falls apart.

How to spot a hallucination

Practical guide for any AI tool:

Specific numbers and names are the highest hallucination risk. If the answer hinges on "30%" or "Dr. Lin's 2018 paper," verify against an actual source.
Confidence is not evidence. An answer written in an authoritative tone is no more likely to be true than a hedging one — possibly less, since hedging often signals genuine uncertainty.
Ask the same question twice, slightly differently. If the answers contradict each other, the model is making it up.
Cross-check with the actual source. If you have your professor's slides, check the answer against them. If a tool offers source links, click them.

Why grounded tutors are better for college specifically

For school, you want the AI's facts to come from your specific course material, not the general internet. Three reasons:

Your exam tests your professor's framing, not the textbook average.
Course-specific context matters. "The relevant precedent" depends on which cases you've actually read this semester.
Verification is faster. An answer linked to your slide deck takes one tap to check; an answer ChatGPT generated requires you to remember whether it matches what's in your notes.

This is the principle behind ClassMinds — the AI tutor is grounded in your uploaded class material, not a generic textbook. When you ask "explain this concept," the answer pulls from the lectures and slides you actually have. It's not a magical fix — RAG can still mess up — but the failure modes are much narrower than raw ChatGPT.

The takeaway

Hallucination isn't going away. It's structural, not a bug to be patched. Future LLMs will be less hallucination-prone (they already are than ChatGPT 3.5 was) but the failure mode will keep existing. The right adaptation:

Use general LLMs (ChatGPT, Claude) for general writing and brainstorming. Verify any facts independently.
Use grounded AI (RAG-based tools) for school-specific work where the answer needs to match your course's material.
Stay skeptical. The day you stop verifying is the day a confident hallucination ends up in your essay.

The good news: the verification habit you build using AI in college transfers to the workforce. Every job that uses LLMs (most of them, soon) needs the same skill: trust but verify.

An AI tutor that's grounded in your real notes

ClassMinds uses RAG to keep the AI inside your actual class material. Free during beta.

Try the iOS beta →