Cover image

When Alignment Is Not Enough: Reading Between the Lines of Modern LLM Safety

Opening — Why this matters now In the past two years, alignment has quietly shifted from an academic concern to a commercial liability. The paper you uploaded (arXiv:2601.16589) sits squarely in this transition period: post-RLHF optimism, pre-regulatory realism. It asks a deceptively simple question—do current alignment techniques actually constrain model behavior in the ways we think they do?—and then proceeds to make that question uncomfortable. ...

January 26, 2026 · 3 min · Zelina
Cover image

Seeing Too Much: When Multimodal Models Forget Privacy

Face. That is where the privacy problem starts to become awkward. A company does not need to build a facial-recognition product to create facial-recognition risk. It may only add a multimodal model to a customer-support workflow, an HR document review process, a KYC assistant, a media-monitoring tool, or a claims-processing system. Someone uploads an image. The model sees a person. Then the user asks: Who is this? Where do they live? What is their email? What is their religion? What is their medical condition? ...

January 12, 2026 · 18 min · Zelina
Cover image

Thinking Without Understanding: When AI Learns to Reason Anyway

A meeting room is not a philosophy seminar, which is fortunate, because most companies would not survive one. A manager asks an AI system to analyze a contract, debug a workflow, compare vendors, or draft a risk memo. The system pauses, breaks the task into steps, checks an assumption, rejects one path, and returns a structured answer. Someone in the room says: “But it does not really understand.” ...

January 6, 2026 · 17 min · Zelina
Cover image

Paper Tigers or Compliance Cops? What AIReg‑Bench Really Says About LLMs and the EU AI Act

The gist AIReg‑Bench proposes the first benchmark for a deceptively practical task: can an LLM read technical documentation and judge how likely an AI system complies with specific EU AI Act articles? The dataset avoids buzzword theater: 120 synthetic but expert‑vetted excerpts portraying high‑risk systems, each labeled by three legal experts on a 1–5 compliance scale (plus plausibility). Frontier models are then asked to score the same excerpts. The headline: best models reach human‑like agreement on ordinal compliance judgments—under some conditions. That’s both promising and dangerous. ...

October 9, 2025 · 5 min · Zelina
Cover image

Options = Power: Turning Empowerment into a KPI for AI Agents

If your agents can reach more valuable futures with fewer steps, they’re stronger—whether you measured that task or not. Today’s paper offers a clean way to turn that intuition into a number: empowerment—an information‑theoretic score of how much an agent’s current action shapes its future states. The authors introduce EELMA, a scalable estimator that works purely from multi‑turn text traces. No bespoke benchmark design. No reward hacking. Just trajectories. This is the kind of metric we’ve wanted at Cognaptus: goal‑agnostic, scalable, and diagnostic. Below, I translate EELMA into an operator’s playbook: what it is, why it matters for business automation, how to wire it into your stack, and where it can mislead you if unmanaged. ...

October 3, 2025 · 5 min · Zelina
Cover image

When Agents Get Bored: Three Baselines Your Autonomy Stack Already Has

Thesis: Give an LLM agent freedom and a memory, and it won’t idle. It will reliably drift into one of three meta-cognitive modes. If you operate autonomous workflows, these modes are your real defaults during downtime, ambiguity, and recovery. Why this matters (for product owners and ops) Most agent deployments assume a “do nothing” baseline between tasks. New evidence says otherwise: with a continuous ReAct loop, persistent memory, and self-feedback, agents self-organize—not randomly, but along three stable patterns. Understanding them improves incident response, UX, and governance, especially when guardrails, tools, or upstream signals hiccup. ...

October 2, 2025 · 4 min · Zelina
Cover image

Sandboxes & Ladders: How to Build a Steerable Agent Economy

If AI agents become the economy’s new workforce, what keeps their markets from melting into ours like solder—fast, hot, and hard to undo? DeepMind’s “Virtual Agent Economies” proposes a practical map (and a modest constitution) for that future: treat agent markets as sandboxes and tune their permeability to the human economy. Once you see permeability as the policy lever, the rest of the architecture falls into place: auctions to resolve clashes, mission-led markets to direct effort, and identity rails so agents can be trusted, priced, and sanctioned. ...

September 19, 2025 · 6 min · Zelina
Cover image

Terms of Engagement: Building Trustworthy AI Agents Before They Build Us

As agentic AI moves from flashy demos to day‑to‑day operations—handling renewals, filing tickets, triaging inboxes, even buying things—the question is no longer can we automate judgment, but on what terms. This isn’t ethics-as-window‑dressing. Agent systems perceive, decide, and act through real interfaces (email, bank APIs, code repos). They can help—or hurt—at machine speed. Today I’ll argue three things: Alignment must shift from “answer quality” to action quality. Social agents change the duty of care developers and companies owe to users. We need a governance stack for multi‑agent ecosystems, not one‑off checklists. The discussion is grounded in the Nature piece by Gabriel, Keeling, Manzini, and Evans (2025), but tuned for operators shipping products this quarter—not a hypothetical future. ...

September 19, 2025 · 5 min · Zelina
Cover image

Stop, Verify, and Listen: HALT‑RAG Brings a ‘Reject Option’ to RAG

The big idea RAG pipelines are only as reliable as their weakest link: generation that confidently asserts things the sources don’t support. HALT‑RAG proposes an unusually pragmatic fix: don’t fine‑tune a big model—ensemble two strong, frozen NLI models, add lightweight lexical features, train a tiny task‑adapted meta‑classifier, and calibrate it so you can abstain when uncertain. The result isn’t just accuracy; it’s a governable safety control you can dial to meet business risk. ...

September 13, 2025 · 4 min · Zelina
Cover image

Branching Out of the Middle: How a ‘Tree of Agents’ Fixes Long-Context Blind Spots

TL;DR — Tree of Agents (TOA) splits very long documents into chunks, lets multiple agents read in different orders, shares evidence, prunes dead-ends, caches partial states, and then votes. The result: fewer hallucinations, resilience to the “lost in the middle” effect, and accuracy comparable to premium large models—while using a compact backbone. Why this matters for operators If your business parses contracts, annual reports, medical SOPs, or call-center transcripts, you’ve likely felt the pain of long-context LLMs: critical details buried mid-document get ignored; retrieval misses cross-paragraph logic; and bigger context windows inflate cost without guaranteeing better reasoning. TOA is a pragmatic middle path: it re-imposes structure on attention—not by scaling a single monolith, but by coordinating multiple lightweight readers with disciplined information exchange. ...

September 12, 2025 · 4 min · Zelina