Cover image

Keys to the Kingdom: How LLMs Can Audit Crypto Logic Before It Breaks

We’ve gotten good at spotting API misuse in crypto code (think “don’t use ECB,” “don’t hardcode IVs”). But many production failures don’t come from the obvious API call—they’re born in the logic that surrounds it: the parameter checks, corner-case math, and brittle “optimizations.” That’s where CryptoScope steps in: an LLM-powered framework that reads crypto code like a human auditor, guided by a domain corpus and structured prompts, to uncover logic-level vulnerabilities without executing the code. ...

August 18, 2025 · 4 min · Zelina
Cover image

Therapy, Explained: How Multi‑Agent LLMs Turn DSM‑5 Screens into Auditable Logic

TL;DR DSM5AgentFlow uses three cooperating LLM agents—Therapist, Client, and Diagnostician—to simulate DSM‑5 Level‑1 screenings and then generate step‑by‑step diagnoses tied to specific DSM criteria. Experiments across four LLMs show a familiar trade‑off: dialogue‑oriented models sounded more natural, while a reasoning‑oriented model scored higher on diagnostic accuracy. For founders and PMs in digital mental health, the win is auditability: every symptom claim can be traced to a quoted utterance and an explicit DSM clause. The catch: results are built on synthetic dialogues, so ecological validity and real‑world safety remain open. ...

August 18, 2025 · 5 min · Zelina
Cover image

Three’s Company: When LLMs Argue Their Way to Alpha

TL;DR A role‑based, debate‑driven LLM system—AlphaAgents—coordinates three specialist agents (fundamental, sentiment, valuation) to screen equities, reach consensus, and build a simple, equal‑weight portfolio. In a four‑month backtest starting 2024‑02‑01 on 15 tech names, the risk‑neutral multi‑agent portfolio outperformed the benchmark and single‑agent baselines; risk‑averse variants underperformed in a bull run (as expected). The real innovation isn’t the short backtest—it’s the explainable process: constrained tools per role, structured debate, and explicit risk‑tolerance prompts. ...

August 18, 2025 · 5 min · Zelina
Cover image

Forecast: Mostly Context with a Chance of Routing

Large language models can forecast surprisingly well when you hand them the right context. But naïve prompts leave money on the table. Today’s paper introduces four plug‑and‑play strategies—ReDP, CorDP, IC‑DP, RouteDP—that lift accuracy, interpretability, and cost‑efficiency without training new models. Here’s what that means for teams running demand, risk, or ops forecasts. Why this matters for business readers Most production forecasts are numeric workhorses (ARIMA/ETS/TS foundation models), while contextual facts—weather advisories, policy changes, promos, strikes—arrive as text. LLMs can read that text and adjust the forecast, but simply stuffing history+context into a prompt (“direct prompting”) is often fragile. The four strategies below are operational patterns you can drop into existing stacks without re‑architecting. ...

August 16, 2025 · 5 min · Zelina
Cover image

RAGulating Compliance: When Triplets Trump Chunks

TL;DR A new multi‑agent pipeline builds an ontology‑light knowledge graph from regulatory text, embeds subject–predicate–object triplets alongside their source snippets in one vector store, and uses triplet‑level retrieval to ground LLM answers. The result: better section retrieval at stricter similarity thresholds, slightly higher answer accuracy, and far stronger navigability across related rules. For compliance teams, the payoff is auditability and explainability baked into the data layer, not just the prompt. ...

August 16, 2025 · 5 min · Zelina
Cover image

Breaking the Question Apart: How Compositional Retrieval Reshapes RAG Performance

In the world of Retrieval-Augmented Generation (RAG), most systems still treat document retrieval like a popularity contest — fetch the most relevant-looking text and hope the generator can stitch the answer together. But as any manager who has tried to merge three half-baked reports knows, relevance without completeness is a recipe for failure. A new framework, Compositional Answer Retrieval (CAR), aims to fix that. Instead of asking a retrieval model to find a single “best” set of documents, CAR teaches it to think like a strategist: break the question into its components, retrieve for each, and then assemble the pieces into a coherent whole. ...

August 11, 2025 · 3 min · Zelina
Cover image

Search When It Hurts: How UR² Teaches Models to Retrieve Only When Needed

Most “smart” RAG stacks are actually compulsive googlers: they fetch first and think later. UR² (“Unified RAG and Reasoning”) flips that reflex. It trains a model to reason by default and retrieve only when necessary, using reinforcement learning (RL) to orchestrate the dance between internal knowledge and external evidence. Why this matters for builders: indiscriminate retrieval is the silent cost center of LLM systems—extra latency, bigger bills, brittle answers. UR² shows a way to make retrieval selective, structured, and rewarded, yielding better accuracy on exams (MMLU‑Pro, MedQA), real‑world QA (HotpotQA, Bamboogle, MuSiQue), and even math. ...

August 11, 2025 · 5 min · Zelina
Cover image

From Stage to Script: How AMADEUS Keeps AI Characters in Character

When you chat with a VTuber’s AI twin or a game NPC that remembers your past adventures, breaking character can ruin the magic. Large language models (LLMs) have the raw conversational talent, but keeping them in character—especially when faced with questions outside their scripted knowledge—is notoriously difficult. AMADEUS, a new RAG-based framework, aims to fix that. The Problem with Persona Drift Most role-playing agents (RPAs) rely on a static “persona paragraph” to define who they are. Retrieval-Augmented Generation (RAG) can pull relevant persona chunks into context, but three problems persist: ...

August 9, 2025 · 3 min · Zelina
Cover image

Graphs, Gains, and Guile: How FinKario Outruns Financial LLMs

In the world of financial AI, where speed meets complexity, most systems are either too slow to adapt or too brittle to interpret the nuanced messiness of real-world finance. Enter FinKario, a new system that combines event-enhanced financial knowledge graphs with a graph-aware retrieval strategy — and outperforms both specialized financial LLMs and institutional strategies in real-world backtests. The Retail Investor’s Dilemma While retail traders drown in information overload, professional research reports contain rich insights — but they’re long, unstructured, and hard to parse. Most LLM-based tools don’t fully exploit these reports. They either extract static attributes (e.g., stock ticker, sector, valuation) or respond to isolated queries without contextual awareness. ...

August 5, 2025 · 3 min · Zelina
Cover image

Shadow Boxing the Market: Option Pricing Without a Safe Haven

One of the most sacred assumptions in financial modeling is the existence of a traded risk-free asset. It anchors discounting, defines arbitrage boundaries, and supports the edifice of Black–Scholes. But what happens when you remove this pillar? Can we still price options, hedge risk, or extract information about funding conditions? In a striking extension of the Lindquist–Rachev (LR) framework, Ziyao Wang shows that not only is it possible — it may reveal financial dynamics that conventional models obscure. ...

August 3, 2025 · 4 min · Zelina