AI Hallucinations Guide: Causes and Fixes

An AI hallucination is a confident, fluent answer a model produces that is factually wrong, made up, or ungrounded in any source the user can verify. That’s the short version. If you’ve ever watched ChatGPT cite a paper that doesn’t exist, watched a lawyer submit fake case law, or watched a customer support bot invent a refund policy, you’ve seen one.

This AI hallucinations guide walks through what’s actually happening inside large language models in 2026, why the problem hasn’t gone away even as models have gotten dramatically better, and the concrete techniques — from prompt design to RAG architecture to evals — that meaningfully reduce hallucinations in real systems.

I’ve spent the last year working with teams shipping LLM features into production, and the single biggest lesson is this: hallucinations aren’t a bug you patch once. They’re a structural property of how these models generate text. You don’t eliminate them. You bound them, you detect them, and you design around them.

Let’s get into it.

What Are AI Hallucinations, Really?

A hallucination is any model output that asserts something false as if it were true. The Cambridge Dictionary added an AI-specific definition in 2023: “a response generated by AI that contains false or misleading information presented as fact” (Wikipedia, “Hallucination (artificial intelligence)”). The metaphor borrows from human psychology, though a lot of researchers find the term unfair to the models — and unfair to humans — because a model isn’t perceiving anything at all. It’s sampling the next most probable token.

The way I think about it: the model is always pattern-completing, and sometimes the pattern that looks most plausible has no grounding in reality.

Researchers typically split hallucinations along two axes:

  • Closed-domain vs. open-domain. A closed-domain hallucination happens when the model is given a source document (say, for summarization) and invents facts not in that document. An open-domain hallucination is when the model is answering from its own parametric memory and gets it wrong.
  • Intrinsic vs. extrinsic. An intrinsic hallucination contradicts the source it’s supposed to be using. An extrinsic hallucination adds information that can’t be verified from the source one way or the other.

The reason these distinctions matter is that they imply different fixes. Closed-domain intrinsic errors are usually a RAG or prompt problem. Open-domain extrinsic errors are usually a model selection or knowledge cutoff problem.

Why Do AI Models Hallucinate? The 7 Root Causes

Hallucinations don’t have a single root cause. They emerge from the interaction of several mechanisms, and understanding which one is biting you in a given failure is what separates engineers who quietly ship reliable LLM features from engineers who keep adding prompt band-aids.

  1. Training data noise and contradictions. Models are trained on huge scrapes of the public web, which contains plenty of misinformation, outdated facts, and contradictory sources. The model learns the distribution, including its errors.

  2. The next-token objective itself. Pre-training rewards models for always producing a next token, even when they have no idea what’s right. As OpenAI researchers put it in a widely cited 2025 paper, the training and evaluation of LLMs rewards guessing over acknowledging uncertainty. The model is literally incentivized to bluff.

  3. Sampling temperature and decoding. Higher temperature (the knob that controls randomness) increases creativity but also increases hallucination. A greedy decoding strategy at temperature 0 is more deterministic, but it can also confidently lock onto a wrong answer if the top probability is wrong.

  4. Attention limitations. Transformer attention has a fixed budget per layer. When relevant information is buried in a long context, the model can fail to attend to it and fill in from prior probabilities instead.

  5. Long context, sparse signal. As context windows have grown past 100K tokens, models have improved, but they still do measurably worse on facts scattered across very long documents than on facts presented up front. A 2024 study from Li et al. (“Retrieval Augmented Generation or Long-Context LLMs?”, arXiv:2407.16833) showed that long-context LLMs beat RAG on average performance, but RAG still wins on cost and on certain query types.

  6. Knowledge cutoff. A model trained on data up to early 2025 cannot know what happened in late 2025 or 2026. If it answers anyway — and it will — that’s a hallucination by definition.

  7. Parametric memory vs. retrieval. A model’s “memory” is compressed into billions of numerical weights. That compression is lossy. Even on facts the model has clearly seen, retrieval is fuzzy and conflation is common (the classic example: confusing two people with the same first name).

Fine-tuning helps shape behavior but doesn’t fundamentally fix any of these. A fine-tuned model is still doing next-token prediction over the same data distribution.

A few of these causes are worth lingering on, because they show up in the same way across every major lab’s interpretability work. Anthropic’s “biology of a large language model” research, for instance, describes internal circuits that decide whether the model will answer at all — and hallucinations happen when those circuits are inhibited in the wrong direction, such as when the model recognizes a name but lacks real information about the person (Anthropic interpretability research, summarized in VentureBeat, March 2025). The implication is that hallucination is, to some degree, a side effect of how the model “decides” to speak. You can nudge the decision threshold, but you can’t eliminate the decision itself.

Hallucination Types vs. Fixes (Comparison Table)

Not every hallucination gets fixed the same way. The table below maps each failure mode to the technique that actually moves the needle for it.

Hallucination typeTypical causeBest primary fixSecondary fix
Open-domain, extrinsicKnowledge cutoff, parametric memoryRAG with fresh corpusAdd “I don’t know” system prompt
Closed-domain, intrinsicModel ignores retrieved contextBetter chunking + citationsConstrained decoding
Numerical / factual recallCompression in weightsTool use / calculatorsSelf-consistency sampling
Long-context, scattered factsAttention dilutionRAG with re-rankingSummarize-then-answer
Confident wrong citationPlausibility biasForce citation with groundedness checkLower temperature to 0
Made-up function call / API argPattern completion over schemaStructured outputs / JSON schemaSchema-constrained decoding
Reasoning chain errorStep compoundingChain-of-thought + self-consistencyVerifier model

The pattern: for any hallucination, the fix is to give the model less freedom or to verify its output. That’s the whole game.

A quick note on RAG specifically, because it’s the most common intervention. RAG reduces hallucination by changing the question from “what does the model know?” to “what does this document say?” — and documents can be wrong, but they’re auditable. In a well-built RAG system, a hallucination usually points to a retrieval failure (the wrong chunk was surfaced) or a chunking failure (the right information got split across chunks), not to the model. A hybrid approach is sometimes better: a 2024 study from Li et al. (arXiv:2407.16833) found that long-context LLMs beat RAG on average when given enough context, but a routing method that sends some queries to RAG and others to long-context can match the best of both at much lower cost.

The 2026 State of Hallucinations: Vectara Leaderboard

The most useful public benchmark in this space is the Vectara Hallucination Leaderboard, last updated May 11, 2026 (github.com/vectara/hallucination-leaderboard). It uses Vectara’s Hughes Hallucination Evaluation Model (HHEM-2.3) to score how often each model introduces a factual error when summarizing a passage it was given.

Selected numbers from the May 2026 update:

ModelHallucination rateFactual consistency rate
antgroup/finix_s1_32b1.8%98.2%
openai/gpt-5.4-nano (Mar 2026)3.1%96.9%
google/gemini-2.5-flash-lite3.3%96.7%
microsoft/Phi-43.7%96.3%
meta-llama/Llama-3.3-70B-Instruct-Turbo4.1%95.9%
openai/gpt-4.1 (Apr 2025)5.6%94.4%
openai/gpt-4o (Aug 2024)9.6%90.4%
anthropic/claude-sonnet-4 (May 2025)10.3%89.7%
anthropic/claude-opus-4 (May 2025)12.0%88.0%
mistralai/mistral-medium-250822.7%77.3%
openai/o3-pro23.3%76.7%
microsoft/Phi-4-mini-instruct23.5%76.5%

Callout: The single most important takeaway from the Vectara leaderboard is that bigger, more expensive, more “reasoning”-heavy models are not necessarily less hallucinated. o3-pro, one of OpenAI’s most capable reasoning models, hallucinates on roughly 1 in 4 summaries — about the same as Phi-4-mini, a tiny open-weight model. In contrast, the small specialized Antgroup Finix model hallucinates only 1.8% of the time. For factual summarization tasks, fit-to-purpose beats raw capability. (Source: Vectara HHEM-2.3, May 11, 2026.)

If you’re picking a model for a RAG pipeline today, the leaderboard is a more honest starting point than MMLU scores.

10 Proven Techniques to Reduce Hallucinations (Ranked)

These aren’t in order of how clever they are. They’re in order of how much impact they tend to have in production. The first three usually account for the bulk of your win.

  1. Retrieval-augmented generation (RAG). Ground the model’s answer in a small, relevant set of documents you provide at query time. This converts most open-domain hallucination into closed-domain verification.

  2. Force citations and verify them. Don’t just have the model answer — have it cite. Then run a groundedness check (often with a second LLM call or NLI model) that confirms each claim is actually supported by the cited source. The Vectara HHEM model itself is an example of this approach.

  3. Set temperature to 0 for factual work. Sampling diversity is great for brainstorming. For facts, you usually want the most probable token. Most model APIs default to higher temperatures, so you have to set this explicitly.

  4. Constrained decoding and structured outputs. Force the model to emit JSON or a function call that matches a schema. Tools like OpenAI’s Structured Outputs, Anthropic’s tool use, and Outlines dramatically reduce the space of invalid outputs.

  5. Self-consistency sampling. Sample the same answer multiple times with temperature > 0 and take the majority answer. The original Self-Consistency paper (arXiv:2203.11171) showed big gains on arithmetic and reasoning tasks. It costs more, but it works.

  6. Chain-of-thought prompting. Asking the model to reason step-by-step before answering reduces the kinds of jump-to-conclusion errors that often produce hallucinations. Pair it with self-consistency for the biggest lift.

  7. Tool use and calculators. If a fact is a number, a date, or a database lookup, hand it to a tool instead of asking the model to “know” it. The model can call the tool; the tool returns the truth.

  8. Tight, well-written system prompts. A clear system prompt that says “If you don’t know, say you don’t know” measurably reduces hallucination, especially in newer models that have been trained to comply with such instructions. Anthropic describes their character-training approach in detail in “Claude’s Character”.

  9. Domain-specific fine-tuning or prompt tuning. If you operate in a narrow domain, fine-tuning on high-quality domain data sharpens the model’s priors. It won’t eliminate hallucination, but it tightens the distribution of plausible answers.

  10. Evals + human review in the loop. Build a test set, score every model and prompt change against it, and keep humans reviewing a sample of production outputs. This is the only way hallucination rates actually go down over time rather than just being talked about.

A few more notes on the techniques above that are worth calling out:

  • Self-consistency is one of the most underused techniques in production. The original paper (arXiv:2203.11171) showed gains of up to +17.9% on GSM8K arithmetic. It costs N times as much, but for high-stakes answers the trade is often worth it.
  • Structured outputs are quietly one of the biggest wins of the last two years. The first time you use OpenAI’s Structured Outputs or Anthropic’s tool-use to constrain a model to a specific JSON schema, the difference is startling: the model suddenly stops inventing fields, misformatting, or hallucinating function arguments.
  • Tool use is what makes agents viable at all. The classic 2023 failure mode — a model confidently computing 7 à 8 as 54 — disappears the moment you let it call a calculator. The same logic applies to databases, search engines, calendars, and code interpreters.

How to Write Prompts That Reduce Hallucination

Prompt engineering isn’t magic, but a few habits move the needle. In my own work, the highest-leverage prompt patterns are:

  • Give the model a “no” path. Explicitly say “If the answer isn’t in the provided context, reply with ‘I don’t have that information’.” Models trained with RLHF have learned to take this instruction seriously, especially in 2026.
  • Separate the system prompt from the user prompt. System prompts are sticky; user prompts are not. The persona, refusal policy, and grounding rules belong in the system prompt.
  • Show, don’t just tell. A few in-context examples of well-grounded answers outperform paragraphs of instructions. Three to five is usually the sweet spot.
  • Anchor on the source. When summarizing, paste the source into the prompt and say “Use only information from the document below.” That’s literally the prompt Vectara uses on their leaderboard, and it works.
  • Ask for structure. Asking for “a short answer, then a list of supporting quotes with citations” forces the model to commit to verifiable claims.
  • Tighten the temperature knob. Defaults are usually 0.7 or 1.0. For factual work, set it to 0.

There’s also a meta-pattern: prompts that demand verifiable outputs outperform prompts that ask for fluent outputs. If you measure yourself on whether the model produced the right answer, the model will optimize for that.

How to Build an Eval Set That Catches Hallucinations

Most teams I work with have no eval set. That’s the single biggest reason hallucination regressions sneak into production. A useful hallucination eval has four ingredients:

  1. A representative set of inputs. 200–500 real or realistic user queries from your domain. Not benchmark datasets; your data.
  2. Ground-truth answers. Written by a human. Include cases where the correct answer is “I don’t know.”
  3. A scoring rubric. For factual tasks, RAGAS’s faithfulness and answer-relevance metrics are a reasonable starting point (docs.ragas.io). For more nuanced quality, you want an LLM-as-judge with a strict rubric.
  4. A known-bad set. Deliberately include queries designed to bait hallucinations (vague prompts, false premises, out-of-scope questions). If your model does well on the hard set, it’s doing well.

Run this eval on every prompt change, every model upgrade, every chunking strategy change. The teams that do this stop having hallucination incidents.

Monitoring Hallucinations in Production

Evals catch things before you ship. Monitoring catches things after.

The most reliable production signal in 2026 is a groundedness classifier — a small, cheap model whose only job is to read a model output and a set of source documents, and output a score for whether each claim is supported. Vectara’s open-source HHEM-2.1 is the canonical example. AWS, Microsoft, and several startups offer commercial versions.

Beyond that, basic telemetry goes a long way:

  • Log every prompt and response. You cannot debug what you cannot see.
  • Sample 1–5% of traffic for human review. Use those labels to retrain your classifiers and graders.
  • Track refusal rate. A model that hallucinates less often sometimes refuses more often. Both are signals worth watching.
  • Watch for prompt drift. A sudden spike in long prompts or new topic areas is a leading indicator of new failure modes.

For high-stakes domains — legal, medical, financial — NIST’s Generative AI Profile (NIST-AI-600-1, published July 2024) and the broader AI Risk Management Framework (nist.gov) give a defensible structure for documenting your controls around hallucination and other gen-AI risks.

Why Production Monitoring Matters More Than You Think

The failure mode I see most often isn’t that a team has no evals — it’s that they have evals but no production monitoring. The eval set passes, the launch goes well, and then six months later hallucination rates have quietly crept up because the model was swapped, the prompt was edited, or the user population shifted. The teams that catch this early have three habits in common:

  • They log everything. Every prompt, every response, every retrieved chunk, every model version, every latency. You can throw most of it away; you cannot reconstruct it after the fact.
  • They sample for human review continuously. Not just at launch. A 1–5% sample, reviewed weekly, gives you a baseline you can detect drift against.
  • They use an automated groundedness classifier on 100% of traffic. This is cheap (a few milliseconds and a fraction of a cent per query) and gives you a real-time dashboard of how grounded your outputs are, segmented by query type, model, and user cohort.

The groundedness classifier is also how you turn hallucination from a vibes-based worry into a number your engineering org can graph, alert on, and budget against.

FAQ

What are AI hallucinations in simple terms? A hallucination is when an AI model says something confidently that isn’t true. It can be a fake fact, a made-up citation, a wrong number, or an answer that contradicts the document it was supposed to be reading.

Why do AI models hallucinate even when they “know” the answer? Because the model is always pattern-completing. It picks the next most probable token, and “the most probable completion” isn’t the same as “the true completion.” The pre-training objective rewards guessing, not abstaining, so the model learns to bluff when uncertain.

Can you eliminate AI hallucinations completely? No. As of 2026, no frontier model is hallucination-free, and researchers generally believe small non-zero rates are a structural property of how these models work. The goal is to detect, bound, and design around them — not to hit zero.

What is the fastest way to reduce hallucinations in a RAG system? Two things, in order: (1) improve your retrieval so the model gets the right context, and (2) add a citation-and-verify step where the model must cite a source for every claim and a separate check confirms each citation actually supports the claim. Together these typically cut user-visible hallucination by half or more.

How do I fact-check an AI’s output? Treat every factual claim as a hypothesis. For each one, ask the model to produce the underlying source, then independently verify the source exists and says what the model claims. Tools like Vectara’s HHEM can automate the verify step at scale. The general rule: never trust, always verify.

10 SOURCES

Sources & References