Retrieval Systems

Autonomous Memory: When AI Starts Debugging Itself

Memory sounds glamorous until someone has to maintain it. In a demo, memory is easy. The agent remembers your name, recalls your last project, and maybe retrieves that one document you uploaded three sessions ago. Very charming. Very investor-deck friendly. Then the system goes into production. The memory store grows. Similar events blur together. Image captions lose details. Timestamps drift. Retrieval starts pulling almost-right context. The model becomes confidently nostalgic about things that did not happen. ...

Show Me the Money (Reasoning): Benchmarking Financial Intelligence in LLMs

Money has a useful habit: it exposes nonsense quickly. In ordinary chatbot use, a slightly wrong answer may be annoying. In financial analysis, a slightly wrong number can change a valuation, distort a risk view, or make a portfolio note look more confident than it deserves. That is why financial AI is not just another “domain application” of large language models. It is a stress test for whether a model can combine facts, time, arithmetic, business context, and restraint without pretending that a polished paragraph is the same as a verified conclusion. ...

Words + Returns: Teaching Embeddings to Invest in Themes

TL;DR for operators The paper behind THEME is not really about asking an LLM to “find AI stocks” and hoping it returns a genius portfolio, because that would be the usual theatre with a Bloomberg terminal costume.1 It is about building a retrieval layer that understands investment themes as a special kind of search problem: cross-sector, text-heavy, time-sensitive, and annoyingly allergic to static classification. ...

Charting a Better Bedside: When Agentic RL Teaches RAG to Diagnose

TL;DR for operators Diagnosis is not a search-box problem. A clinician does not simply type a symptom list, read a guideline, and pick a disease like ordering takeaway. The useful work is iterative: form a hypothesis, compare against similar cases, notice what does not fit, retrieve again, ignore plausible-looking rubbish, and only then commit. ...

From Stage to Script: How AMADEUS Keeps AI Characters in Character

TL;DR for operators Characters are easy when they stay on script. They become expensive when users ask the wrong question, which is, naturally, what users do. The AMADEUS paper addresses a specific failure mode in retrieval-augmented role-playing agents: ordinary RAG can retrieve facts, but persona consistency often depends on inferred traits, values, habits, and narrative context rather than direct answers. A user asks, “Are you confident everything will work out?” The persona document may not contain that sentence. Naive RAG may grab a superficially similar chunk and improvise badly. AMADEUS instead tries to retrieve evidence from which a character’s attributes can be inferred, then feeds those attributes into generation.1 ...