Cover image

Forecast: Mostly Context with a Chance of Routing

TL;DR for operators Most forecasting teams already have decent numerical forecasters. Their problem is not that ARIMA, ETS, Lag-Llama, Chronos, or internal demand models suddenly forgot how Tuesdays work. The problem is that many important forecast shocks arrive as text: heat-wave notices, maintenance schedules, holiday effects, price caps, promotions, policy changes, store closures, one-off events, and all the other messy little business facts that refuse to fit politely into a clean covariate table. ...

August 16, 2025 · 17 min · Zelina
Cover image

From Zero to Reasoning Hero: How R-Zero Teaches Itself Without Human Data

TL;DR for operators R-Zero is a self-evolving training framework for reasoning LLMs that starts with one base model, splits it into two roles, and lets them co-train: a Challenger generates difficult questions, while a Solver learns to answer them.1 The useful business takeaway is not “models no longer need data.” That is the sort of sentence that should be handled with tongs. R-Zero removes the need for external task datasets and human labels in its training loop, but it still depends on engineered reward signals, majority-vote pseudo-labels, answer-format discipline, filtering, and objective correctness checks. “Zero data” here means zero external tasks and labels, not zero structure. ...

August 8, 2025 · 15 min · Zelina
Cover image

Thinking Without Talking: How SynAdapt Lets LLMs Reason in Silence

TL;DR for operators SynAdapt is not a paper about making models “think secretly” because mystery sells better on conference posters. It is a paper about inference budgeting: when a model should spend tokens explaining its reasoning, and when it can compress that reasoning into latent vectors and move on. The method trains a large language model to use synthetic continuous chain-of-thought—CCoT—as a dense internal reasoning representation instead of generating long natural-language reasoning traces. For easier problems, the model answers using this latent representation directly. For harder problems, a difficulty classifier detects that silent reasoning is likely insufficient and routes the question back to discrete chain-of-thought, with a prompt that keeps the re-thinking concise.1 ...

August 4, 2025 · 15 min · Zelina
Cover image

Echoes in the Algorithm: How GPT-4o's Stories Flatten Global Culture

TL;DR for operators The paper does not merely say that GPT-generated stories contain national clichés. That would be mildly interesting, in the way that discovering a tourist brochure likes sunsets is mildly interesting. The sharper finding is structural. When Rettberg and Wigers prompted gpt-4o-mini to write 1,500-word “potential” stories for 236 demonyms, the model produced surface diversity—olive trees, fjords, forests, trains, village elders, festivals—but repeatedly returned to the same basic narrative machine: someone comes back to a small town or village, discovers that community or tradition has weakened, organises a symbolic event, and restores harmony.1 ...

July 31, 2025 · 16 min · Zelina
Cover image

From Chaos to Care: Structuring LLMs with Clinical Guidelines

TL;DR for operators Patient records are not just long documents. They are timelines with consequences. CliCARE, the framework proposed in the paper, attacks that problem by turning longitudinal cancer EHRs into patient-specific temporal knowledge graphs, then aligning those patient trajectories with clinical guideline knowledge graphs before asking an LLM to generate a clinical summary and recommendation.1 That sounds architectural because it is. The useful lesson is not that “AI can help doctors,” a phrase now so overused it should probably be placed in quarantine. The lesson is that clinical AI improves when the model is given a structured representation of disease progression and a normative map of what should happen next. ...

July 31, 2025 · 16 min · Zelina
Cover image

Factor Factory: How LLMs Are Reinventing Sparse Portfolio Optimization

TL;DR for operators Portfolio teams do not usually fail because they have no models. They fail because the models age, the signals decay, and the process of discovering new sparse selection logic is slow, expensive, and wonderfully allergic to market regime shifts. The paper behind EFS — Evolutionary Factor Search — proposes a useful change in framing: stop asking the LLM to “pick stocks” and ask it to generate executable alpha-factor formulas that can be backtested, filtered, evolved, and used to rank assets under sparse portfolio constraints.1 That distinction matters. The LLM is not the portfolio manager. It is the factor-factory intern with suspicious stamina. The backtest loop is still the adult in the room. ...

July 27, 2025 · 17 min · Zelina
Cover image

The Sentiment Edge: How FinDPO Trains LLMs to Think Like Traders

TL;DR for operators News is only useful when it survives the journey from headline to position sizing. FinDPO, proposed by Giorgos Iacovides, Wuyang Zhou, and Danilo Mandic, is a finance-specific Llama-3-8B-Instruct sentiment model trained with Direct Preference Optimization rather than ordinary supervised fine-tuning.1 The paper’s headline result is not merely that FinDPO scores well on sentiment benchmarks. Plenty of models win benchmarks, then politely disappear when transaction costs arrive. ...

July 27, 2025 · 14 min · Zelina
Cover image

Plug Me In: Why LLMs with Tools Beat LLMs with Size

TL;DR for operators The Athena paper is useful because it makes a simple operational point that many AI buying committees still manage to avoid: a bigger language model is not the same thing as a better workflow.1 An LLM can explain, infer, and format. It is still a poor substitute for a calculator, a live database, a calendar API, a search service, or a domain-specific computation engine. This is not a moral failure. It is just architecture. ...

July 14, 2025 · 14 min · Zelina
Cover image

Beyond the Pull Request: What ChatGPT Teaches Us About Productivity

TL;DR for operators Most companies still ask the wrong first question about LLMs in software development: “Do they make developers write code faster?” That question is not useless. It is just too small. A recent paper by Sardar Bonabi, Sarah Bana, Vijay Gurbaxani, and Tingting Nian uses Italy’s temporary 2023 ChatGPT ban as a natural experiment to examine what happened to public GitHub activity when Italian developers abruptly lost access to ChatGPT, compared with similar developers in France and Portugal.1 The study covers 88,022 open-source software developers and looks at a 16-week window: eight weeks before the ban, four weeks during it, and four weeks after access was restored. ...

July 1, 2025 · 17 min · Zelina
Cover image

Divide and Conquer: How LLMs Learn to Teach

TL;DR for operators The useful finding is not “LLMs can write lessons.” They can, in the same way a junior analyst can write a memo: quickly, plausibly, and with enough confidence to become dangerous if nobody reads it. The paper tests GPT-4o with retrieval-augmented generation (RAG) for creating interactive, scenario-based lessons used to train novice human tutors in online middle-school mathematics.1 The lesson topics are practical rather than ornamental: encouraging student independence, encouraging help-seeking behaviour, and persuading students to turn cameras on during online tutoring. ...

June 24, 2025 · 17 min · Zelina