If AutoML is a fast car, financial institutions need a train with tracks—a workflow that knows where it’s going, logs every switch, and won’t derail when markets regime-shift. A new framework called TS-Agent proposes exactly that: a structured, auditable, LLM-driven agent that plans model development for financial time series instead of blindly searching.
Unlike generic AutoML, TS-Agent formalizes modeling as a multi-stage decision process—Model Pre-selection → Code Refinement → Fine-tuning—and anchors each step in domain-curated knowledge banks and reflective feedback from real runs. The result is not just higher accuracy; it’s traceability and consistency that pass governance sniff tests.
What TS-Agent Actually Is (in business English)
-
A planner agent orchestrates a workflow for forecasting and synthetic data generation.
-
Decisions at each step are grounded in three structured knowledge banks:
- Case Bank: past financial tasks & proven solutions; 2) Financial TS Code Base: ready-to-run models & evaluation metrics; 3) Refinement Knowledge Bank: training heuristics (e.g., learning-rate schedules, leakage checks, walk-forward validation).
-
The agent keeps auditable logs of what changed, why, and what happened, enabling compliance, debugging, and reproducibility.
The Three Knowledge Banks at a Glance
Bank | What it holds | What decision it improves | Why it matters in finance |
---|---|---|---|
Case Bank | Prior tasks & winning approaches | Model pre-selection | Case-based reasoning cuts dead-ends, aligns to familiar regimes |
Financial TS Code Base | Implemented models (e.g., Autoformer, PatchTST, DLinear; TimeGAN, DDPM) + metric suite | Refine without re-inventing | Lowers variance vs. freehand code-gen; faster, more reliable |
Refinement Knowledge Bank | Heuristics: scaling, leakage prevention, schedulers, weight decay, early stopping, cross-validation | Make code better (safely) | Enforces best practices and prevents silent model risk |
Why This Matters (beyond RMSE)
- Auditability by design: Each code edit is isolated and justified; logs tie decisions to outcomes—ideal for model risk management (MRM) and internal audit.
- Lower variance across LLMs: Because the agent edits within a curated code base, results don’t swing wildly with backbone swaps; that’s operational stability.
- Finance-first metrics: Beyond RMSE/MAE/MAPE/sMAPE, TS-Agent optimizes Sharpe/VaR/ES deltas and distributional & dependency scores for generators—metrics risk teams actually care about.
How the Workflow Runs
-
Stage 1 – Model Pre-selection
- Retrieve similar cases; shortlist models (e.g., Autoformer vs. PatchTST for 60→3-day stock forecasting).
-
Stage 2 – Code Refinement (two phases)
- Warm-up (round-robin): iterate per model with small edits (e.g., ReduceLROnPlateau, weight decay) and quick tuning; keep only best variants.
- Optimization: focus on the top candidate; iterate longer, rejecting edits that don’t improve validation metrics; keep full logs.
Think of it as chain-of-code-edits: a controlled, reversible path of improvements with checkpoints—like Git for modeling decisions.
Evidence: Does Planning Beat Searching?
Forecasting across Crypto (hourly), Exchange (daily FX), and U.S. Stock (daily):
- TS-Agent achieved 100% run success and lowest error with modern LLMs, cutting RMSE >20% vs. AutoML on Exchange and ~8% on Crypto, and up to 30% vs. DS-Agent and 15–40% vs. ResearchAgent.
- On risk-sensitive metrics for Crypto, TS-Agent delivered the lowest Sharpe/VaR deltas (≈20% better than competing agents with comparable LLMs), indicating forecasts preserve market structure—not just point accuracy.
Synthetic generation (GAN/VAE/diffusion families on Exchange/Stock/Crypto):
- TS-Agent consistently ranked top on Marginal, Correlation, Autocorrelation, and Covariance distances; it matched or beat Optuna while maintaining 100% success across LLMs.
- Variance across backbones was materially lower than generic agents—practical reliability for stress testing and data augmentation.
Practical Implications for Financial Teams
Where it shines
- Regulated environments needing explainable model evolution and tight change control.
- Shops with partially standardized codebases seeking agentic acceleration without free-form code risk.
- Volatile markets (crypto, EM FX) where risk-aligned metrics matter as much as pure error.
What to watch
- Curation debt: The Case/Code/Refinement banks must be curated and updated; treat them like a product.
- Guardrails: Keep write-access to production repos gated; TS-Agent should propose PRs, not hot-patch prod.
- Data discipline: The system assumes leak-free splits and correct walk-forward; governance should validate these assumptions.
A 30-Day Adoption Playbook
Week 1: Baseline & Banks
- Inventory current forecasting/generation tasks and metrics (add risk metrics if missing).
- Seed the Code Base with vetted implementations and unit tests; draft the Refinement Bank from internal MRM checklists.
Week 2: Pilot Workflow
- Run TS-Agent on one asset class (e.g., FX daily). Capture all logs; require human sign-off on each commit.
- Compare against your current AutoML baseline on accuracy and Sharpe/VaR/ES deltas.
Week 3: Governance Tightening
- Wire logs to your MRM system (model inventory IDs, approvals, owners).
- Add policy checks (leakage detectors, dataset lineage tags) to the Refinement Bank.
Week 4: Scale & SRE
- Parallelize Warm-up across symbols; centralize artifacts (configs, weights, logs) with retention policies.
- Create a PR-only pathway: TS-Agent opens PRs with rationale & metrics; reviewers approve/merge.
Cheat-Sheet: Pain Points → TS-Agent Moves
Pain Point | TS-Agent Mechanism | Outcome |
---|---|---|
Black-box AutoML trails on risk metrics | Finance-first metric bank + reflective tuning | Lower Sharpe/VaR/ES deltas, better trading fidelity |
Fragile freehand code generation | Edit within a curated code base | Fewer bugs, faster wins, easier audits |
Hard-to-explain model drift | Chain-of-code-edits + logs | Reproducible, auditable evolution |
LLM backbone instability | Knowledge banks + constrained edits | Lower variance across LLMs |
Bottom Line
TS-Agent reframes “try everything and hope” into plan → edit → measure → log. In markets where what changed and why is as important as how well it performs, this is the agentic blueprint that finally respects both P&L and policy.
Cognaptus: Automate the Present, Incubate the Future.