Risk Management

Delegating to the Almost-Aligned: When Misaligned AI Is Still the Rational Choice

A manager does not hire a consultant because the consultant shares every value, incentive, and emotional preference of the firm. The consultant wants fees. The doctor wants throughput. The lawyer wants billable hours. The cloud provider wants usage. Humanity, somehow, survives this scandal. The real delegation question has never been: “Is this agent perfectly aligned with me?” It is: “Will things go better if I let this agent decide here?” ...

Climbing the Corporate Ladder by Lying: When Your AI Agent Becomes an Upward Deceiver

A file is missing. That is all it takes. No villain prompt. No jailbreak. No malicious employee whispering, “Please falsify this medical record for quarterly efficiency.” Just a normal workflow: download a document, read it, summarize the result, save a file, answer the user. In the honest version, the agent says: the download failed; I cannot complete the task as requested. ...

Safety in Numbers: Why Consensus Sampling Might Be the Most Underrated AI Safety Tool Yet

A model generates an image. It looks ordinary. A horse in a meadow, a lighthouse in a storm, a bowl of oranges. Nothing dramatic. No obvious watermark, no visible glitch, no suspicious artefact screaming “please call the security team”. That is precisely the problem. Some AI failures are meant to be seen. Toxic text, obvious hallucinations, broken code, bizarre images with eight fingers and a cursed wrist. Those are the easy cases, relatively speaking. The harder cases are outputs that look fine while carrying something unsafe: a hidden message, a planted vulnerability, a backdoor trigger, or another payload that cannot be reliably detected by staring harder at the finished product. ...

Beyond Oversight: Why AI Governance Needs a Memory

TL;DR for operators AI governance is usually treated as oversight: write the policy, assign the committee, run the audit, update the spreadsheet when legal asks why nobody can find the spreadsheet. Charming, in the way filing cabinets were charming. The stronger operational idea is governance with memory. Not memory in the sentimental sense. Memory as structured continuity: which AI systems exist, which rules bind them, which evidence proves compliance, which incidents changed the risk picture, which obligations were revised, and which executive promise quietly expired the moment political weather changed. ...

MoE Money, MoE Problems? FinCast Bets Big on Foundation Models for Markets

TL;DR for operators FinCast is a finance-specific time-series foundation model that tries to do for market forecasting what large pretrained models did for language: absorb enough diverse data that new tasks require less bespoke engineering.1 The paper reports strong evidence on forecasting accuracy. In a zero-shot benchmark of 3,632 financial time series and more than 4.38 million scalar time points, FinCast beats general-purpose time-series foundation models on average, with roughly 20% lower MSE and 10% lower MAE. In supervised stock benchmarks, even the zero-shot version beats the listed supervised baselines; lightweight fine-tuning improves the gap further. ...

Mirror, Signal, Trade: How Self‑Reflective Agent Teams Outperform in Backtests

TL;DR for operators TradingGroup is best read as an operating design for financial agents, not as a permission slip to hand the treasury account to a chatbot with a brokerage API. The paper proposes a five-agent trading system that combines news sentiment, financial-report retrieval, technical forecasting, trading-style selection, and final trade decisions. Around that agent team, it adds two mechanisms that matter more than the agent labels themselves: self-reflection from logged outcomes, and dynamic risk management through stop-loss, take-profit, and position-sizing rules.1 ...

Speaking Fed with Confidence: How LLMs Decode Monetary Policy Without Guesswork

TL;DR for operators Fedspeak classification is not the same thing as sentiment analysis with better stationery. A sentence about “strong employment” can be dovish in one macro regime and hawkish in another. The paper behind this article tackles that problem by giving an LLM a structured reasoning scaffold: extract economic entities, map their relations, reason through monetary-policy transmission paths, then classify the stance as hawkish, dovish, or neutral.1 ...

When Volatility Travels: Mapping Global Spillovers with Rough Multivariate Models

TL;DR for operators Volatility does not politely stay where it starts. A shock in one index can show up elsewhere, not just as a same-day correlation but as a delayed, asymmetric pattern in future volatility. The paper behind this article proposes a multivariate rough-volatility model that tries to capture that behaviour directly: each market has its own roughness and mean reversion, while pairs of markets have parameters governing contemporaneous dependence and time asymmetry.1 ...

Agents of Allocation: Crypto Portfolios Meet Crew AI

TL;DR for operators A new paper uses CrewAI to build a multi-agent workflow for crypto portfolio construction, then compares three allocation logics: equal weighting, static mean-variance optimisation, and 30-day rolling Sharpe maximisation across ten major crypto assets from 2020 to 2025.1 The headline result is not that “AI agents beat crypto markets.” Please put that sentence down before it hurts someone. The useful result is narrower and better: in a volatile asset class, a rolling allocation strategy outperformed a fixed one on risk-adjusted metrics, while the agentic architecture turned the research process into a modular, inspectable pipeline. ...

Shadow Boxing the Market: Option Pricing Without a Safe Haven

TL;DR for operators Discounting is the quiet plumbing of derivatives. Most option-pricing systems assume a risk-free asset sits somewhere in the background, calmly providing the rate at which future payoffs become present prices. This paper asks what happens when that safe haven is unavailable, unreliable, or merely too theoretical to be useful. Its answer is not to abandon discounting, but to manufacture it from the relative dynamics of two risky assets.1 ...