LLM Hallucination Detection Framework

Detect, classify, and score AI hallucinations with structured fact-checking prompt templates. Identify fabrications, contradictions, and exaggerations in LLM outputs. Build confidence scoring rubrics and export production-ready detection prompts.

No data leaves your browser

Claim Analysis

Claim to verify

Source / reference text (ground truth)

Domain context

Detection Result

Enter a claim and optional source text, then click "Analyze Claim" to classify the hallucination type and generate a confidence score. The detector decomposes claims into individual assertions and evaluates each against the provided evidence.

Extracted Claims

Individual claims extracted from the input will appear here with per-claim verdicts.

Fact-Checking Prompt Template Library

Production-ready prompt templates for different hallucination detection scenarios. Select a template to preview, customize, and copy.

Confidence Scoring Rubric Builder

Customize the scoring rubric to match your domain and risk tolerance. Scores from 1 (unsupported) to 5 (fully verified).

Fully Verified

Claim is explicitly stated in the source or directly derivable with no inference required. Multiple independent sources confirm.

Well Supported

Claim is strongly implied by the source with minimal logical inference. Consistent with known facts in the domain.

Partially Supported

Claim has some basis in the source but includes details not directly verifiable. May involve reasonable extrapolation.

Weakly Supported

Claim has tangential connection to the source but key assertions are not verifiable. Significant inference or assumption required.

Unsupported

Claim has no support in the source, is directly contradicted, or contains fabricated information not found anywhere in the evidence.

Custom scoring notes (included in export)

Export Detection Prompts

Click "Export Full Configuration" to generate a complete hallucination detection prompt package including all templates, rubric, and domain settings ready for integration into your pipeline.

Copied to clipboard

The LLM Hallucination Problem: A Comprehensive Analysis

Hallucination is the most significant reliability challenge facing large language models in production. When an LLM generates text that sounds authoritative but contains fabricated facts, incorrect citations, or distorted claims, the consequences range from embarrassing to dangerous. A legal brief citing non-existent cases, a medical summary with fabricated drug interactions, or a financial analysis with invented statistics can cause real harm. Understanding why hallucinations happen, how to detect them, and how to build systems that minimize their impact is essential for anyone deploying AI in high-stakes applications.

Taxonomy of LLM Hallucinations

1. Fabrication (Intrinsic Hallucination)

Fabrication occurs when the model generates information that does not exist in any source and has no basis in reality. This is the most dangerous type of hallucination because fabricated content often sounds completely plausible. Common fabrication patterns include inventing research papers with realistic-sounding titles, authors, and journal names; generating fake statistics with precise numbers; creating non-existent URLs that follow real website patterns; and attributing quotes to real people that they never said. Fabrication is most prevalent when models are asked about niche topics, recent events, or specific details that require precise recall rather than general knowledge.

2. Contradiction (Extrinsic Hallucination)

Contradiction happens when the model's output directly conflicts with the provided context, source documents, or its own prior statements within the same response. Unlike fabrication, contradiction involves real information that gets distorted. A summarization model might state the opposite of what the source document says. A Q&A model might answer "yes" when the context clearly indicates "no." Self-contradiction occurs when a model makes conflicting claims within the same response — for example, stating a company was founded in 2018 in one paragraph and 2020 in another. Contradiction is particularly common in long-form generation where the model loses coherence over many paragraphs.

3. Exaggeration (Magnitude Hallucination)

Exaggeration preserves the directional truth of a claim while distorting its magnitude. A study showing a 15% improvement becomes "nearly doubling performance." A company with 500 employees becomes "thousands of employees." A technology used by some researchers becomes "widely adopted across the industry." Exaggeration is insidious because the core claim has a basis in truth, making it harder to detect than outright fabrication. It is especially common in summaries and analyses where models tend to amplify the significance of findings to make their outputs more compelling.

4. Conflation (Fusion Hallucination)

Conflation merges details from two or more separate true facts into a single false statement. For example, if Person A won a Nobel Prize in Physics and Person B won a Nobel Prize in Chemistry, the model might conflate them into "Person A won the Nobel Prize in Chemistry." Each component is real, but the combination is false. Conflation is particularly common in biographical and historical content where models merge attributes from multiple similar entities. It is also frequent in technical documentation where features from different product versions get combined.

Detection Methodologies

Claim Decomposition

The most effective detection approach starts by decomposing a complex statement into individual atomic claims. The sentence "A 2024 Stanford study led by Dr. Smith found that GPT-4 achieves 97% accuracy on medical diagnosis across 50,000 cases" contains at least five verifiable claims: (1) the study was from Stanford, (2) it was published in 2024, (3) it was led by Dr. Smith, (4) the accuracy figure was 97%, and (5) the sample size was 50,000. Each claim can be independently verified, and a fabrication in any one component makes the entire statement unreliable. This decomposition approach catches partial hallucinations that would be missed by evaluating the statement as a whole.

Natural Language Inference (NLI)

NLI-based detection uses models trained to classify the relationship between a premise (source) and hypothesis (claim) as entailment, contradiction, or neutral. When applied to hallucination detection, the source document is the premise and each extracted claim is the hypothesis. Claims classified as "contradiction" are clear hallucinations, while "neutral" claims require further investigation — they may be true but simply not mentioned in the source. NLI models like DeBERTa-v3 fine-tuned on NLI datasets provide a lightweight, fast first-pass filter before more expensive LLM-based verification.

LLM-as-Judge

Using a separate LLM to evaluate the output of another LLM is increasingly common in production systems. The judge model receives the original claim, the source material, and a structured evaluation rubric, then produces a verdict with reasoning. This approach is more flexible than NLI because the judge can handle nuanced cases, explain its reasoning, and adapt to domain-specific criteria. However, the judge model can itself hallucinate, so multi-judge systems that aggregate verdicts from multiple independent evaluations produce more reliable results. Claude Opus 4 and GPT-4o are the most commonly used judge models due to their strong instruction-following and reasoning capabilities.

Self-Consistency Checking

Self-consistency detection generates multiple responses to the same query and compares them for agreement. If the model produces contradictory answers across multiple samples, the inconsistent claims are likely hallucinations — true facts tend to be reproduced consistently while fabricated details vary between generations. This technique is especially effective for factual recall tasks where there is a single correct answer. The downside is cost — it requires multiple API calls per query, though using temperature=0 for one baseline and temperature=0.7 for variations provides a good balance.

Building a Production Detection Pipeline

A robust hallucination detection pipeline for production use combines multiple techniques in a cascade architecture. The first stage uses fast, cheap heuristics: checking for known hallucination patterns like fabricated URLs, impossible dates, and statistical outliers. The second stage applies NLI-based classification on extracted claims against retrieved evidence. The third stage escalates uncertain or high-risk claims to an LLM-based judge for detailed analysis. This cascade approach keeps costs manageable — only 10-20% of claims typically need the expensive LLM judge evaluation.

For retrieval-augmented generation (RAG) systems, grounding detection is a specific variant that checks whether the model's output is faithful to the retrieved documents. The detection prompt should explicitly instruct the judge to only consider the provided sources and flag any claim not directly supported by the retrieved passages. This is simpler than open-ended fact-checking because the evidence set is bounded and available.

Prompt Engineering for Hallucination Reduction

Instructing Uncertainty Acknowledgment

Adding explicit instructions like "If you are not confident about a fact, say so rather than guessing" significantly reduces fabrication rates. Claude models respond particularly well to this instruction and will default to "I don't have enough information to verify this" rather than generating plausible-sounding but unverified claims. For GPT models, adding "Do not fabricate citations or statistics — only reference information you are confident about" produces similar effects.

Grounding Prompts

Grounding prompts restrict the model to only use information from provided source documents. "Answer ONLY based on the following documents. If the answer is not in the documents, say 'Not found in provided sources.'" This approach eliminates most fabrication hallucinations in RAG systems. The key is making the constraint explicit and providing a graceful fallback phrase the model can use instead of fabricating an answer.

Chain-of-Verification (CoVe)

Chain-of-Verification, introduced by Meta researchers in 2023, is a technique where the model first generates an answer, then generates verification questions about its own claims, answers those questions independently, and finally revises its original answer based on any inconsistencies found. This self-checking loop catches many hallucinations that a single-pass generation would produce. The CoVe approach can be implemented as a multi-turn prompt or as separate API calls in a pipeline, with the verification step using a different model or temperature setting for independence.

Domain-Specific Considerations

Medical and Healthcare

Medical hallucinations are among the highest-risk because they can directly affect patient outcomes. LLMs frequently hallucinate drug interactions, dosage information, and clinical trial results. Detection systems for medical content should maintain a reference database of approved drugs, known interactions, and clinical guidelines. Any generated claim about specific medications, dosages, or treatments should be verified against authoritative sources like FDA labels, PubMed, and clinical practice guidelines. The confidence threshold for medical content should be set higher than for general content — a score of 4/5 or above on the rubric before presenting information to users.

Legal and Regulatory

Legal hallucinations famously entered public awareness when attorneys submitted AI-generated briefs containing citations to non-existent cases. Legal content requires verification of case names, citations, statutory references, and regulatory details. A specialized detection pipeline for legal content should cross-reference cited cases against legal databases, verify statute numbers and text, and flag any legal conclusion not explicitly supported by cited authority. The most common legal hallucination pattern is generating plausible-sounding case names and citations that map to no real case.

Scientific and Technical

Scientific hallucinations often involve fabricated citations, incorrect attribution of findings, and distorted statistical results. The detection framework should verify DOIs and paper titles against databases like Semantic Scholar, CrossRef, and PubMed. Author names should be cross-referenced with actual publication records. Statistical claims should be checked for internal consistency — a study cannot have a p-value of 0.001 with a sample size of 5 for most standard tests. Technical specifications and API details should be verified against official documentation.

Frequently Asked Questions

What are the main types of LLM hallucinations?

LLM hallucinations fall into three primary categories. Fabrication is when the model invents facts, citations, statistics, or events that do not exist — such as generating a fake research paper with plausible-sounding authors and DOIs. Contradiction occurs when the model's output directly conflicts with the provided source material or established facts. Exaggeration is when the model distorts factual information by inflating numbers, overstating significance, or making unsupported superlative claims. A fourth category, conflation, involves merging details from separate true facts into a single false statement.

How reliable are LLM-based hallucination detectors?

LLM-based hallucination detection achieves 70-85% accuracy on standard benchmarks like TruthfulQA and HaluEval, depending on the detection model and prompt strategy. Multi-step verification prompts that decompose claims and cross-reference each component independently achieve higher accuracy than single-pass detection. However, no automated method is 100% reliable — subtle hallucinations that blend real and fabricated information remain challenging. For high-stakes applications, automated detection should be combined with human review.

What is a confidence scoring rubric for hallucination detection?

A confidence scoring rubric assigns numerical scores to claims based on how well they are supported by source evidence. A typical 5-point scale rates claims from 1 (no support — claim cannot be verified or is directly contradicted) to 5 (strong support — claim is explicitly stated or directly derivable from the source). The rubric considers factors like source specificity, logical inference distance, and the presence of hedging language. Building a custom rubric for your domain ensures consistent evaluation across different reviewers and automated systems.

Which AI models hallucinate the least?

As of May 2026, Claude Opus 4 and GPT-4o consistently rank among the lowest hallucination rates in benchmarks. Claude models are particularly strong at acknowledging uncertainty — they are more likely to say "I don't know" rather than fabricating an answer. Reasoning models like o3 and DeepSeek R1 also show lower hallucination rates because their chain-of-thought process catches inconsistencies. However, all models can hallucinate, especially on niche topics, recent events beyond their training data, and queries that require precise numerical recall.

How do I build a hallucination detection pipeline for production?

A production hallucination detection pipeline typically has four stages: (1) Claim extraction — decompose the LLM output into individual factual claims. (2) Evidence retrieval — for each claim, retrieve relevant source documents or knowledge base entries. (3) Entailment scoring — use an NLI model or LLM-based judge to score whether the evidence supports, contradicts, or is neutral to each claim. (4) Aggregation — combine per-claim scores into an overall confidence rating with flagged issues. For cost efficiency, use a cheaper model (Haiku, GPT-4o Mini) for claim extraction and a stronger model (Opus, GPT-4o) for entailment scoring on flagged claims only.

About the Author

Built by Michael Lip — solo developer with 10+ years experience. 140+ PRs merged into open source projects including Google Chrome and Axios. Creator of 20+ developer tools across the Zovo network. No tracking, no ads, no data collection.