Cover image

The Rise of FreePhD: How Multiagent Systems are Reimagining the Scientific Method

The Rise of FreePhD: How Multiagent Systems are Reimagining the Scientific Method In today’s AI landscape, most “autonomous scientists” still behave like obedient lab assistants: they follow rigid checklists, produce results, and stop when the checklist ends. But science, as any human researcher knows, is not a checklist—it’s a messy, self-correcting process of hypotheses, failed attempts, and creative pivots. That is precisely the gap freephdlabor seeks to close. Developed by researchers at Yale and the University of Chicago, this open-source framework reimagines automated science as an ecosystem of co-scientist agents that reason, collaborate, and adapt—much like a real research group. Its tagline might as well be: build your own lab, minus the PhD. ...

October 25, 2025 · 4 min · Zelina
Cover image

When Numbers Meet Narratives: How LLMs Reframe Quant Investing

In the world of quantitative investing, the line between data and story has long been clear. Numbers ruled the models; narratives belonged to the analysts. But the recent paper “Exploring the Synergy of Quantitative Factors and Newsflow Representations from Large Language Models for Stock Return Prediction” from RAM Active Investments argues that this divide is no longer useful—or profitable. Beyond Factors: Why Text Matters Quantitative factors—valuation, momentum, profitability—are the pillars of systematic investing. They measure what can be counted. But markets move on what’s talked about, too. Corporate press releases, analyst notes, executive reshuffles—all carry signals that often precede price action. Historically, this qualitative layer was hard to quantify. Now, LLMs can translate the market’s chatter into vectors of meaning. ...

October 25, 2025 · 3 min · Zelina
Cover image

Beyond Utility: When LLM Agents Start Dreaming Their Own Tasks

When large language models started solving math problems and writing code, they were celebrated as powerful tools. But a recent paper from INSAIT and ETH Zurich—LLM Agents Beyond Utility: An Open‑Ended Perspective—suggests something deeper may be stirring beneath the surface. The authors don’t simply ask what these agents can do, but whether they can want to do anything at all. From Obedience to Autonomy Most current LLM agents, even sophisticated ones like ReAct or Reflexion, live inside tight task loops: you prompt them, they plan, act, observe, and return a result. Their agency ends with the answer. But this study challenges that boundary by giving the agent a chance to set its own goals, persist across runs, and store memories of past interactions. ...

October 23, 2025 · 4 min · Zelina
Cover image

Blueprints of Agency: Compositional Machines and the New Architecture of Intelligence

When the term agentic AI is used today, it often conjures images of individual, autonomous systems making plans, taking actions, and learning from feedback loops. But what if intelligence, like biology, doesn’t scale by perfecting one organism — but by building composable ecosystems of specialized agents that interact, synchronize, and co‑evolve? That’s the thesis behind Agentic Design of Compositional Machines — a sprawling, 75‑page manifesto that reframes AI architecture as a modular society of minds, not a monolithic brain. Drawing inspiration from software engineering, systems biology, and embodied cognition, the paper argues that the next generation of LLM‑based agents will need to evolve toward compositionality — where reasoning, perception, and action emerge not from larger models, but from better‑coordinated parts. ...

October 23, 2025 · 4 min · Zelina
Cover image

When the Lab Thinks Back: How LabOS Turns AI Into a True Co-Scientist

When we talk about AI in science, most imaginations stop at the screen — algorithms simulating molecules, predicting reactions, or summarizing literature. But in LabOS, AI finally steps off the screen and into the lab. It doesn’t just compute hypotheses; it helps perform them. The Missing Half of Scientific Intelligence For decades, computation and experimentation have formed two halves of discovery — theory and touch, model and pipette. AI has supercharged the former, giving us AlphaFold and generative chemistry, but the physical laboratory has remained stubbornly analog. Robotic automation can execute predefined tasks, yet it lacks situational awareness — it can’t see contamination, notice a wrong reagent, or adapt when a human makes an unscripted move. ...

October 23, 2025 · 4 min · Zelina
Cover image

When Lateral Beats Linear: How LToT Rethinks the Tree of Thought

When Lateral Beats Linear: How LToT Rethinks the Tree of Thought AI researchers are learning that throwing more compute at reasoning isn’t enough. The new Lateral Tree-of-Thoughts (LToT) framework shows that the key isn’t depth—but disciplined breadth. The problem with thinking deeper As models like GPT and Mixtral gain access to massive inference budgets, the default approach—expanding Tree-of-Thought (ToT) searches—starts to break down. With thousands of tokens or nodes to explore, two predictable pathologies emerge: ...

October 21, 2025 · 3 min · Zelina
Cover image

Beyond Answers: Measuring How Deep Research Agents Really Think

Artificial intelligence is moving past chatbots that answer questions. The next frontier is Deep Research Agents (DRAs) — AI systems that can decompose complex problems, gather information from multiple sources, reason across them, and synthesize their findings into structured reports. But until recently, there was no systematic way to measure how well these agents perform beyond surface-level reasoning. That is the gap RigorousBench aims to fill. From Q&A to Reports: The Benchmark Shift Traditional LLM benchmarks — like GAIA, WebWalker, or BrowseComp — test how accurately a model answers factual questions. This approach works for short-form reasoning but fails for real-world research tasks that demand long-form synthesis and multi-source validation. ...

October 9, 2025 · 3 min · Zelina
Cover image

Paper Tigers or Compliance Cops? What AIReg‑Bench Really Says About LLMs and the EU AI Act

The gist AIReg‑Bench proposes the first benchmark for a deceptively practical task: can an LLM read technical documentation and judge how likely an AI system complies with specific EU AI Act articles? The dataset avoids buzzword theater: 120 synthetic but expert‑vetted excerpts portraying high‑risk systems, each labeled by three legal experts on a 1–5 compliance scale (plus plausibility). Frontier models are then asked to score the same excerpts. The headline: best models reach human‑like agreement on ordinal compliance judgments—under some conditions. That’s both promising and dangerous. ...

October 9, 2025 · 5 min · Zelina
Cover image

Plan>Then>Profit: Reinforcement Learning That Teaches LLMs to Outline Before They Think

TL;DR Most LLMs reason token‑by‑token and get lost in the weeds. PTA‑GRPO is a two‑stage method that (1) distills short, high‑level plans from a stronger teacher and (2) reinforces both the final answer and the plan’s quality. Across math benchmarks, it reliably outperforms GRPO/DAPO while producing shorter, cleaner solutions. For AI builders, the principle is simple: force an outline, then reward it. Why this paper matters for builders (not just benchmark chasers) From local greed to global guidance. Traditional CoT is myopic: it optimizes each next token. PTA‑GRPO adds a global outline that trims detours and reduces reasoning drift. Aligns with how teams actually work. Great analysts draft an outline before the memo; great agents should too. PTA‑GRPO operationalizes that habit. Product leverage: If your agents make multi‑step decisions (pricing, triage, troubleshooting), rewarding plan quality prevents hallucinated subgoals and makes reasoning auditable. Compute sanity: Instead of expensive tree search at inference, PTA‑GRPO trains planning skill so you can keep runtime simple. The core idea in one picture (words) Plan → Think → Answer. ...

October 9, 2025 · 4 min · Zelina
Cover image

Promptfolios: When Buffett Becomes a System Prompt

TL;DR A fresh study builds five prompt‑guided LLM agents—each emulating a legendary investor (Buffett, Graham, Greenblatt, Piotroski, Altman)—and backtests them on NASDAQ‑100 stocks from Q4 2023 to Q2 2025. Each agent follows a deterministic pipeline: collect metrics → score → construct a weighted portfolio. The Buffett agent tops the pack with ~42% CAGR, beating the NASDAQ‑100 and S&P 500 benchmarks in the window tested. The result isn’t “LLMs discovered alpha,” but rather: prompts can reliably translate qualitative philosophies into reproducible, quantitative rules. The real opportunity for practitioners is governed agent design—measurable, auditable prompts tied to tools—plus robust validation far beyond a single bullish regime. ...

October 9, 2025 · 5 min · Zelina