
FAITH in Numbers: Stress-Testing LLMs Against Financial Hallucinations
Financial AI promises speed and scale — but in finance, a single misplaced digit can be the difference between compliance and catastrophe. The FAITH (Framework for Assessing Intrinsic Tabular Hallucinations) benchmark tackles this risk head‑on, probing how well large language models can faithfully extract and compute numbers from the dense, interconnected tables in 10‑K filings. From Idea to Dataset: Masking With a Purpose FAITH reframes hallucination detection as a context‑aware masked span prediction task. It takes real S&P 500 annual reports, hides specific numeric spans, and asks the model to recover them — but only after ensuring three non‑negotiable conditions: ...