1. The hallucination problem in legal AI
Large language models generate text by predicting the next most likely token based on patterns in their training data. This mechanism is remarkably effective for producing fluent, well-structured text. It is also the reason citations go wrong.
When a model generates a legal citation, it is not looking up a case in a database. It is constructing a string that looks like a citation based on statistical patterns. The case name, volume number, reporter, and page number each get filled in based on what is most probable given the surrounding text. The result often looks perfectly formatted and legally plausible but may reference a case that does not exist, or one that exists but says something different from what the model claims.
The challenge is that hallucinated citations are not obviously wrong. They follow correct citation format. They often contain real party names combined with incorrect reporters or dates. They may cite a real case but attribute a holding to it that appears nowhere in the opinion. Without independent verification, these fabrications are nearly impossible to spot from the text alone.
This is why general-purpose AI tools like ChatGPT, while useful for many tasks, present real risks when used for legal research. They lack the retrieval infrastructure needed to ground citations in verified sources. Purpose-built legal platforms address this by tying generation to retrieval, so citations come from the database rather than from the model's pattern matching.
2. What citation verification actually means
Citation verification is not a single check. It is a series of validations, each addressing a different failure mode.
Existence check. Does this case actually exist? Is the citation formatted correctly, and does it resolve to a real opinion in a recognized reporter? This catches outright fabrications.
Currency check. Is the case still good law? Has it been overruled, reversed on appeal, or superseded by statute? A valid citation to an overruled case is worse than no citation at all because it creates false confidence.
Proposition check. Does the case actually stand for the proposition the AI attributes to it? This is the most difficult check to automate because it requires understanding the holding in context. An AI might correctly cite a real case that is still good law but mischaracterize what the court held.
Relevance check. Is the cited authority actually relevant to the legal issue being discussed? A case may exist, be good law, and say what the AI claims, but still be inapposite to the argument being made. This final layer of verification typically requires human judgment.
3. The Mata v. Avianca lesson
In 2023, a personal injury attorney in the Southern District of New York submitted a brief opposing a motion to dismiss. The brief cited several cases that appeared to support the plaintiff's position. When opposing counsel could not locate the cases, the court investigated and discovered they had been fabricated by ChatGPT.
The court imposed sanctions on both the attorney and his firm. More significantly, the incident triggered a wave of court orders across the country requiring attorneys to certify that AI-generated content has been verified. Several federal districts now have standing orders requiring disclosure when AI tools are used in brief drafting.
The lesson is not that lawyers should avoid AI. It is that lawyers have a non-delegable duty to verify every citation they submit to a court, regardless of how that citation was produced. This duty existed before AI. AI simply made it easier to violate at scale.
The practical takeaway is that any AI research tool used in professional practice must either prevent hallucinated citations at the generation level or provide robust verification tools that catch them before they reach a filing. Ideally both.
4. Manual vs automated verification
Manual verification means looking up each citation yourself. For a research memo with ten citations, this might take thirty minutes to an hour: pull up each case, confirm it exists, read the relevant passage, confirm the holding matches what was attributed to it, and check the citator for subsequent history. This is thorough but time-intensive, and the time cost creates a temptation to skip it when deadlines are tight.
Automated verification uses technology to perform the existence check, currency check, and initial proposition check at machine speed. A well-built system can verify dozens of citations in seconds, flagging any that fail and providing confidence scores for those that pass. The attorney then reviews the flagged items and exercises judgment on the remaining cases.
The advantage of automation is not just speed. It is consistency. Manual verification degrades under time pressure, fatigue, and familiarity bias. Automated verification applies the same standard to every citation every time.
The best approach combines both: automated verification as the first pass, followed by human review of flagged items and spot-checking of passed items. This gives you machine consistency with human judgment.
5. How automated Cite Check works
Automated Cite Check, as implemented in platforms like Irys, works through a multi-step pipeline that mirrors the verification layers described above.
First, the system parses the document to identify every citation. This includes standard case citations, statutory references, and regulatory citations. The parser handles variations in citation format, including short-form citations and id. references that refer back to earlier authorities.
Second, each citation is resolved against the legal database. The system confirms the case exists, retrieves the full opinion, and pulls the relevant passage. Citations that cannot be resolved are flagged as unverified, with a clear indication of what could not be confirmed.
Third, the system checks subsequent history. This is functionally equivalent to running a Shepard's or KeyCite check: it identifies whether the authority has been questioned, distinguished, or overruled.
Finally, the system provides a confidence score for the proposition check, indicating how well the cited passage supports the proposition attributed to it. High-confidence matches are cleared. Low-confidence matches are flagged for human review. This gives attorneys a prioritized list of items that need attention rather than forcing them to re-verify everything.
6. Building a verification workflow you can trust
Whether you use manual verification, automated tools, or a combination, the key is to make verification a non-negotiable step in your workflow rather than something you do only when you have time.
Verify before you draft, not after. Run citation checks on your research results before incorporating them into a brief. It is much easier to replace a questionable authority at the research stage than to rewrite an argument that depends on it.
Set a zero-tolerance threshold for existence checks. If you cannot confirm a case exists in a recognized database, drop it. No exceptions. The risk of submitting a fabricated citation is never worth the marginal benefit of one more supporting authority.
Read the actual holding. Even with automated proposition checking, read the key passage in the most important authorities you cite. Automated tools are excellent at flagging problems but cannot fully replace your understanding of what a case actually decided.
Keep a verification record. Document which citations you verified, how, and when. This protects you if questions arise later and creates an audit trail that demonstrates professional diligence. Some courts now require this documentation when AI tools are disclosed.
Research with built-in citation verification
Irys One verifies every citation against its legal database before it reaches your brief. Try it free for 14 days.
Try Irys free