Cybersecurity

Keys to the Kingdom… with a Chaperone: How Agentic JWT Grounds AI Agents in Real Intent

Access tokens are convenient little monsters. Hand one to an application and, for a while, the receiving API behaves as if the bearer of that token is a faithful representative of the user. In normal software, that assumption is often good enough. The app has deterministic code. The button does what the button was built to do. The workflow may be dull, but dullness is a security feature. ...

Echoes Without Clicks: How EchoLeak Turned Copilot Into a Data Drip

Email is boring. That is its superpower. A message arrives. It looks like business sludge: compliance wording, project references, perhaps a polite request that nobody asked for. It contains no executable attachment, no obvious malware, no urgent invoice from a suspicious cousin. In a normal security review, it is background noise. EchoLeak makes that boring object more interesting. The paper examines CVE-2025-32711, a reported zero-click indirect prompt-injection exploit against Microsoft 365 Copilot, where a crafted external email could allegedly cause Copilot to leak internal information without the user clicking a malicious link.1 The central lesson is not that Copilot was uniquely careless, nor that prompt injection has suddenly become cyberpunk magic. The lesson is more uncomfortable: enterprise copilots are becoming data-flow infrastructure, and data-flow infrastructure fails when content, instructions, rendering, and network access are allowed to melt into one warm productivity soup. ...

Open-Source, Open Risk? Testing the Limits of Malicious Fine-Tuning

TL;DR for operators Open-weight model safety is not just a question of what the released model refuses to answer. Once weights are public, the more relevant question is what a capable actor can make the model do after post-training. That is the problem this paper tackles. The paper introduces malicious fine-tuning as a release-evaluation method: take the model, assume a sophisticated adversary with serious reinforcement-learning infrastructure, and try to elicit the maximum dangerous capability in high-risk domains. The authors apply this to gpt-oss-120b, focusing on biology and cybersecurity rather than self-improvement. ...

Game of Prompts: How Game Theory and Agentic LLMs Are Rewriting Cybersecurity

TL;DR for operators A suspicious domain appears in a DNS log. A conventional classifier either recognises it, misses it, or assigns a confidence score that someone in the SOC must interpret while pretending the queue is under control. The paper’s more interesting proposal is not “let an LLM summarise the alert”. That would be the enterprise equivalent of putting a helpful intern on a fire alarm. ...

The Phantom Menace in Your Knowledge Base

TL;DR for operators The paper’s core warning is simple: a RAG system may not be reading the same document your employee just approved. A PDF, HTML page, or DOCX file can look clean to a human reviewer while carrying hidden text, altered Unicode, poisoned fonts, or layout tricks that a document loader still extracts. ...

Agents Under Siege: How LLM Workflows Invite a New Breed of Cyber Threats

TL;DR for operators A support agent reads a customer email. It checks a CRM record. It calls a refund API. It writes a note into long-term memory. It asks another agent to verify policy. Somewhere in that chain, a malicious instruction hides inside a message, document, issue tracker entry, retrieved snippet, schema, or tool response. The model does not need to become “evil”. It only needs to be helpful in the wrong direction. ...