Cover image

When RAG Meets the Law: Building Trustworthy Legal AI for a Moving Target

Opening — Why this matters now Legal systems are allergic to uncertainty. Yet, AI thrives on it. As generative models step into the courtroom—drafting opinions, analyzing precedents, even suggesting verdicts—the question is no longer can they help, but can we trust them? The stakes are existential: a hallucinated statute or a misapplied precedent isn’t a typo; it’s a miscarriage of justice. The paper Hybrid Retrieval-Augmented Generation Agent for Trustworthy Legal Question Answering in Judicial Forensics offers a rare glimpse at how to close this credibility gap. ...

November 6, 2025 · 4 min · Zelina
Cover image

Agents with Interest: How Fintech Taught RAG to Read the Fine Print

Opening — Why this matters now The fintech industry is an alphabet soup of acronyms and compliance clauses. For a large language model (LLM), it’s a minefield of misunderstood abbreviations, half-specified processes, and siloed documentation that lives in SharePoint purgatory. Yet financial institutions are under pressure to make sense of their internal knowledge—securely, locally, and accurately. Retrieval-Augmented Generation (RAG), the method of grounding LLM outputs in retrieved context, has emerged as the go-to approach. But as Mastercard’s recent research shows, standard RAG pipelines choke on the reality of enterprise fintech: fragmented data, undefined acronyms, and role-based access control. The paper Retrieval-Augmented Generation for Fintech: Agentic Design and Evaluation proposes a modular, multi-agent redesign that turns RAG from a passive retriever into an active, reasoning system. ...

November 4, 2025 · 4 min · Zelina
Cover image

When Rules Go Live: Policy Cards and the New Language of AI Governance

When Rules Go Live: Policy Cards and the New Language of AI Governance In 2019, Model Cards made AI systems more transparent by documenting what they were trained to do. Then came Data Cards and System Cards, clarifying how datasets and end-to-end systems behave. But as AI moves from prediction to action—from chatbots to trading agents, surgical robots, and autonomous research assistants—documentation is no longer enough. We need artifacts that don’t just describe a system, but govern it. ...

November 2, 2025 · 4 min · Zelina
Cover image

Paper Tigers or Compliance Cops? What AIReg‑Bench Really Says About LLMs and the EU AI Act

The gist AIReg‑Bench proposes the first benchmark for a deceptively practical task: can an LLM read technical documentation and judge how likely an AI system complies with specific EU AI Act articles? The dataset avoids buzzword theater: 120 synthetic but expert‑vetted excerpts portraying high‑risk systems, each labeled by three legal experts on a 1–5 compliance scale (plus plausibility). Frontier models are then asked to score the same excerpts. The headline: best models reach human‑like agreement on ordinal compliance judgments—under some conditions. That’s both promising and dangerous. ...

October 9, 2025 · 5 min · Zelina
Cover image

Bracket Busters: When Agentic LLMs Turn Law into Code (and Catch Their Own Mistakes)

TL;DR Agentic LLMs can translate legal rules into working software and audit themselves using higher‑order metamorphic tests. This combo improves worst‑case reliability (not just best‑case demos), making it a practical pattern for tax prep, benefits eligibility, and other compliance‑bound systems. The Business Problem Legal‑critical software (tax prep, benefits screening, healthcare claims) fails in precisely the ways that cause the most reputational and regulatory damage: subtle misinterpretations around thresholds, phase‑ins/outs, caps, and exception codes. Traditional testing stumbles here because you rarely know the “correct” output for every real‑world case (the oracle problem). What you do know: similar cases should behave consistently. ...

October 1, 2025 · 5 min · Zelina
Cover image

Prolog & Paycheck: When Tax AI Shows Its Work

TL;DR Neuro‑symbolic architecture (LLMs + Prolog) turns tax calculation from vibes to verifiable logic. The paper we analyze shows that adding a symbolic solver, selective refusal, and exemplar‑guided parsing can lower the break‑even cost of an AI tax assistant to a fraction of average U.S. filing costs. Even more interesting: chat‑tuned models often beat reasoning‑tuned models at few‑shot translation into logic — a counterintuitive result with big product implications. Why this matters for operators (not just researchers) Most back‑office finance work is a chain of (1) rules lookup, (2) calculations, and (3) audit trails. Generic LLMs are great at (1), decent at (2), and historically bad at (3). This work shows a practical path to auditable automation: translate rules and facts into Prolog, compute with a trusted engine, and price the risk of being wrong directly into your product economics. ...

August 31, 2025 · 5 min · Zelina
Cover image

Forgetting by Design: Turning GDPR into a Systems Problem for LLMs

The “right to be forgotten” (GDPR Art. 17) has always seemed like kryptonite for large language models. Once a trillion-parameter system memorizes personal data, how can it truly be erased without starting training from scratch? Most prior attempts—whether using influence functions or alignment-style fine-tuning—felt like damage control: approximate, unverifiable, and too fragile to withstand regulatory scrutiny. This new paper, Unlearning at Scale, turns the problem on its head. It argues that forgetting is not a mathematical optimization problem, but a systems engineering challenge. If training can be made deterministic and auditable, then unlearning can be handled with the same rigor as database recovery or transaction rollbacks. ...

August 19, 2025 · 3 min · Zelina
Cover image

RAGulating Compliance: When Triplets Trump Chunks

TL;DR A new multi‑agent pipeline builds an ontology‑light knowledge graph from regulatory text, embeds subject–predicate–object triplets alongside their source snippets in one vector store, and uses triplet‑level retrieval to ground LLM answers. The result: better section retrieval at stricter similarity thresholds, slightly higher answer accuracy, and far stronger navigability across related rules. For compliance teams, the payoff is auditability and explainability baked into the data layer, not just the prompt. ...

August 16, 2025 · 5 min · Zelina
Cover image

Collapse to Forget: Turning Model Collapse into a Privacy Feature for LLMs

Machine unlearning, once a fringe technical curiosity, is fast becoming a legal and ethical imperative. With increasing regulatory demands like the GDPR’s “right to be forgotten,” AI developers are being asked a hard question: Can a large language model truly forget? A new paper from researchers at TUM and Mila provides an unexpectedly elegant answer. Instead of fighting model collapse—the phenomenon where iterative finetuning on synthetic data causes a model to forget—they propose embracing it. ...

July 8, 2025 · 4 min · Zelina