
Bracket Busters: When Agentic LLMs Turn Law into Code (and Catch Their Own Mistakes)
TL;DR Agentic LLMs can translate legal rules into working software and audit themselves using higher‑order metamorphic tests. This combo improves worst‑case reliability (not just best‑case demos), making it a practical pattern for tax prep, benefits eligibility, and other compliance‑bound systems. The Business Problem Legal‑critical software (tax prep, benefits screening, healthcare claims) fails in precisely the ways that cause the most reputational and regulatory damage: subtle misinterpretations around thresholds, phase‑ins/outs, caps, and exception codes. Traditional testing stumbles here because you rarely know the “correct” output for every real‑world case (the oracle problem). What you do know: similar cases should behave consistently. ...