AutoML

Less Prompt, More Blueprint: MOSAIC and the Data-Science Agent That Keeps Receipts

TL;DR for operators MOSAIC is best read as a system-design paper, not as another entry in the increasingly crowded genre of “we attached an LLM to Python and hoped for the best.” The paper introduces a structured agentic framework for automated data science where the agent builds an explicit workflow blueprint before generating code, then verifies, executes, and refines candidates using diagnostic feedback and failure-aware offline reinforcement learning.1 ...

Causal Brews: Why Your Feature Engineering Needs a Graph Before a Grid Search

Feature engineering has always had a faint smell of kitchen experimentation. Take the raw variables. Add ratios. Try logs. Multiply this by that. Remove the ones that look useless. Feed everything into XGBoost. Pretend the process was scientific because the final notebook has a clean cross-validation table. In many business analytics teams, this is not a caricature. It is Tuesday. ...

When Three Examples Beat a Thousand GPUs

A GPU bill is usually treated as a hardware problem. Buy faster accelerators, shorten training runs, negotiate a better cloud contract. Less often asked is whether the expensive part of the pipeline began with a badly calibrated prompt. An LLM generating neural-network architectures can create thousands of candidates before training begins. If the prompt provides too little context, the model may repeatedly produce shallow variations of the same familiar design. Add more examples, and it may combine useful ideas across architectural families. Add still more, and the output can become worse, incomplete, or invalid. ...

Hypotheses, Not Hunches: What an AI Data Scientist Gets Right

TL;DR for operators The paper introduces an “AI Data Scientist”: a six-subagent system that moves from raw tabular data to cleaned data, tested hypotheses, engineered features, trained models, and business-facing recommendations.1 The useful idea is not that another agent can write Python. Congratulations, we have met 2025. The useful idea is that hypothesis testing becomes the workflow’s organising rail. ...

Quants With a Plan: Agentic Workflows That Outtrade AutoML

TL;DR for operators A quant team does not need a chatbot that “has ideas” about markets. It needs a workflow that can select a sensible model, change one thing at a time, run the experiment, keep the better version, reject the worse one, and leave a paper trail that a human can inspect without requiring divination. ...

Forecast First, Ask Later: How DCATS Makes Time Series Smarter with LLMs

TL;DR for operators Forecasting teams usually ask the same question first: which model should we use? DCATS suggests a more operationally useful question: which related histories should this model learn from? The paper introduces DCATS, a Data-Centric Agent for Time Series, an LLM-agent framework that improves forecasting by selecting auxiliary time series for fine-tuning rather than by designing a new forecasting architecture.1 In the authors’ traffic forecasting study, GPT-4 Turbo reads metadata about nearby or similar California traffic sensors, proposes candidate neighbour sets, lets lightweight forecasting models test those proposals, and then refines the next round using validation error. ...