Compliance AI

TL;DR for operators Legal AI does not fail only because models “hallucinate”. That word has become the industry’s favourite fog machine. The more operational diagnosis is sharper: models fail when they answer current legal questions from stale internal memory and then dress the error in confident reasoning. The L-MARS paper is useful because it separates two tasks that vendors often blend together for convenience: retrieving current legal facts and reasoning over stable legal principles.1 On LegalSearchQA, a new 50-question benchmark built around recent U.S. legal facts verified in March 2026, L-MARS reaches 96.0% accuracy. Zero-shot GPT-4o-mini reaches 58.0%. Chain-of-thought falls to 30.0%, because step-by-step reasoning from outdated premises merely creates a more articulate mistake. ...