Cover image

When Diffusion Learns How to Open Drawers

Opening — Why this matters now Embodied AI has a dirty secret: most simulated worlds look plausible until a robot actually tries to use them. Chairs block drawers, doors open into walls, and walkable space exists only in theory. As robotics shifts from toy benchmarks to household-scale deployment, this gap between visual realism and functional realism has become the real bottleneck. ...

January 14, 2026 · 3 min · Zelina
Cover image

Let There Be Light (and Agents): Automating Quantum Experiments

Opening — Why this matters now Quantum optics sits at an awkward intersection: conceptually elegant, mathematically unforgiving, and operationally tedious. Designing even a “classic” experiment often means stitching together domain intuition, optical components, and simulation code—usually in tools that were never designed for conversational exploration. As AI agents move from text completion to task execution, the obvious question emerges: can they design experiments, not just describe them? ...

December 20, 2025 · 3 min · Zelina
Cover image

DeepPersona and the Rise of Synthetic Humanity

Opening — Why this matters now As large language models evolve from word predictors into behavioral simulators, a strange frontier has opened: synthetic humanity. From virtual therapists to simulated societies, AI systems now populate digital worlds with “people” who never existed. Yet most of these synthetic personas are shallow — a few adjectives stitched into a paragraph. They are caricatures of humanity, not mirrors. ...

November 11, 2025 · 4 min · Zelina
Cover image

When Algorithms Command: AI's Quiet Revolution in Battlefield Strategy

Opening — Why this matters now Autonomous systems have already taken to the skies. Drones scout, strike, and surveil. But the subtler transformation is happening on the ground—inside simulation labs where algorithms are learning to outthink humans. A recent study by the Swedish Defence Research Agency shows how AI can autonomously generate and evaluate thousands of tactical options for mechanized battalions in real time. In other words: the software isn’t just helping commanders—it’s starting to plan the war. ...

November 10, 2025 · 4 min · Zelina
Cover image

When the Sandbox Thinks Back: Training AI Agents in Simulated Realities

Opening — Why this matters now The AI industry has a curious paradox: we can train models to reason at Olympiad level, but they still fumble at booking flights or handling a spreadsheet. The problem isn’t intelligence—it’s context. Agents are trained in narrow sandboxes that don’t scale, breaking the moment the environment changes. Microsoft and the University of Washington’s Simia framework tackles this bottleneck with a provocative idea: what if the agent could simulate its own world? ...

November 6, 2025 · 4 min · Zelina
Cover image

Pods over Prompts: Shachi’s Playbook for Serious Agent-Based Simulation

TL;DR Shachi is a modular methodology for building LLM-driven agent-based models (ABMs) that replaces ad‑hoc prompt spaghetti with four standardized cognitive components—Configs, Memory, Tools, and an LLM reasoning core. The result: agents you can port across environments, benchmark rigorously, and use to study nontrivial dynamics like tariff shocks with externally valid outcomes. For enterprises, Shachi is the missing method for turning agent demos into decision simulators. Why this paper matters to operators (not just researchers) Most enterprise “agent” pilots die in the gap between a clever demo and a reliable simulator that leaders can trust for planning. Shachi closes that gap by: ...

October 3, 2025 · 5 min · Zelina
Cover image

Consent, Coaxing, and Countermoves: Simulating Privacy Attacks on LLM Agents

When organizations deploy LLM-based agents to email, message, and collaborate on our behalf, privacy threats stop being static. The attacker is now another agent able to converse, probe, and adapt. Today’s paper proposes a simulation-plus-search framework that discovers these evolving risks—and the countermeasures that survive them. The result is a rare, actionable playbook: how attacks escalate in multi-turn dialogues, and how defenses must graduate from rules to identity-verified state machines. ...

August 18, 2025 · 5 min · Zelina
Cover image

SIMURA Says: Don’t Guess, Simulate

The dominant paradigm in LLM agents today is autoregressive reasoning: think step by step, commit token by token. This approach works decently for small tasks — write a tweet, answer a math question — but it quickly falters when the goal requires deep planning, multiple decision branches, or adapting to partially observable environments. Imagine trying to plan a vacation or operate a flight search website while thinking only one move ahead. ...

August 1, 2025 · 3 min · Zelina
Cover image

Agents of Disruption: How LLMs Became Adversarial Testers for Autonomous Driving

The promise of fully autonomous vehicles hinges on their ability to handle not just the average drive—but the unexpected. Yet, creating rare, safety-critical scenarios for testing autonomous driving (AD) systems has long been a bottleneck. Manual scene creation doesn’t scale. Generative models often drift away from real-world distributions. And collecting edge cases on the road? Too dangerous, too slow. Enter AGENTS-LLM, a deceptively simple yet powerful framework that uses Large Language Models (LLMs) not to solve traffic scenes, but to break them. The twist? These aren’t just static prompts or synthetic scripts. AGENTS-LLM organizes LLMs into a multi-agent, modular system that modifies real traffic scenarios with surgical precision—making them trickier, nastier, and far more useful for evaluating planning systems. ...

July 21, 2025 · 3 min · Zelina
Cover image

Personas with Purpose: How TinyTroupe Reimagines Multiagent Simulation

If you’ve ever tried to simulate user behavior using LLMs, you’ve probably noticed the same frustrating pattern: the agents are too polite, too helpful, and too similar. They lack the kind of quirks, inconsistencies, and contextually grounded views that make real people interesting—and unpredictable. Enter TinyTroupe, Microsoft’s new open-source toolkit that flips the script on LLM-agent design. Instead of building yet another task-oriented assistant or collaborative workflow bot, TinyTroupe takes the form of a behavioral simulation laboratory. It invites us to think of agents not as obedient coworkers, but as idiosyncratic personas—each with their own backstories, beliefs, and sometimes maddening biases. ...

July 15, 2025 · 4 min · Zelina