Cover image

Consent, Coaxing, and Countermoves: Simulating Privacy Attacks on LLM Agents

When organizations deploy LLM-based agents to email, message, and collaborate on our behalf, privacy threats stop being static. The attacker is now another agent able to converse, probe, and adapt. Today’s paper proposes a simulation-plus-search framework that discovers these evolving risks—and the countermeasures that survive them. The result is a rare, actionable playbook: how attacks escalate in multi-turn dialogues, and how defenses must graduate from rules to identity-verified state machines. ...

August 18, 2025 · 5 min · Zelina
Cover image

SIMURA Says: Don’t Guess, Simulate

The dominant paradigm in LLM agents today is autoregressive reasoning: think step by step, commit token by token. This approach works decently for small tasks — write a tweet, answer a math question — but it quickly falters when the goal requires deep planning, multiple decision branches, or adapting to partially observable environments. Imagine trying to plan a vacation or operate a flight search website while thinking only one move ahead. ...

August 1, 2025 · 3 min · Zelina
Cover image

Agents of Disruption: How LLMs Became Adversarial Testers for Autonomous Driving

The promise of fully autonomous vehicles hinges on their ability to handle not just the average drive—but the unexpected. Yet, creating rare, safety-critical scenarios for testing autonomous driving (AD) systems has long been a bottleneck. Manual scene creation doesn’t scale. Generative models often drift away from real-world distributions. And collecting edge cases on the road? Too dangerous, too slow. Enter AGENTS-LLM, a deceptively simple yet powerful framework that uses Large Language Models (LLMs) not to solve traffic scenes, but to break them. The twist? These aren’t just static prompts or synthetic scripts. AGENTS-LLM organizes LLMs into a multi-agent, modular system that modifies real traffic scenarios with surgical precision—making them trickier, nastier, and far more useful for evaluating planning systems. ...

July 21, 2025 · 3 min · Zelina
Cover image

Personas with Purpose: How TinyTroupe Reimagines Multiagent Simulation

If you’ve ever tried to simulate user behavior using LLMs, you’ve probably noticed the same frustrating pattern: the agents are too polite, too helpful, and too similar. They lack the kind of quirks, inconsistencies, and contextually grounded views that make real people interesting—and unpredictable. Enter TinyTroupe, Microsoft’s new open-source toolkit that flips the script on LLM-agent design. Instead of building yet another task-oriented assistant or collaborative workflow bot, TinyTroupe takes the form of a behavioral simulation laboratory. It invites us to think of agents not as obedient coworkers, but as idiosyncratic personas—each with their own backstories, beliefs, and sometimes maddening biases. ...

July 15, 2025 · 4 min · Zelina
Cover image

Twin It to Win It: How BedreFlyt Reimagines Hospital Resource Planning

Twin It to Win It: How BedreFlyt Reimagines Hospital Resource Planning Hospitals often operate under intense pressure, juggling patient needs, staff availability, and limited resources. Now imagine an AI-powered assistant that anticipates those needs, simulates complex patient flows, and delivers optimized resource plans—without burning out the staff. That’s the promise of BedreFlyt, a modular, simulation-driven Digital Twin (DT) designed for hospital wards. Developed at the University of Oslo, BedreFlyt isn’t just another simulation tool. It uniquely integrates: ...

May 13, 2025 · 3 min