Opening — Why this matters now

Foreign Information Manipulation and Interference (FIMI) has quietly evolved from a niche security concern into a persistent, high‑tempo operational problem. Social media platforms now host influence campaigns that are faster, cheaper, and increasingly AI‑augmented. Meanwhile, defenders are expected to produce timely, explainable, and interoperable assessments—often across national and institutional boundaries.

This is the tension at the heart of modern information defense: frameworks like DISARM provide conceptual clarity, but day‑to‑day investigations remain labor‑intensive, interpretive, and slow. The paper An Agentic Operationalization of DISARM for FIMI Investigation on Social Media proposes a direct answer: let autonomous AI agents execute the framework itself.

Background — From taxonomy to bottleneck

DISARM was designed as a shared language. By organizing disinformation campaigns into tactics, techniques, and procedures (TTPs), it allows analysts, policymakers, and allied partners to describe influence operations consistently. In theory, this supports attribution, coordination, and evidence‑based response.

In practice, however, DISARM has a scaling problem. Mapping social‑media behavior to TTPs requires expert judgment applied to massive, multilingual, rapidly evolving datasets. As AI lowers the cost of running influence operations, the asymmetry grows: attackers automate, defenders annotate.

The authors identify this gap clearly. The problem is not the lack of frameworks—but the lack of operational machinery that can execute them continuously, transparently, and at speed.

Analysis — What the paper actually builds

The proposed solution is a multi‑agent investigative pipeline that treats DISARM not as documentation, but as an executable workflow.

At a high level, the system runs in iterative cycles:

  1. Technique‑guided exploration: An LLM agent selects a DISARM technique and formulates hypotheses about how that technique might manifest in the data.
  2. Structured investigation: The agent operationalizes these hypotheses into concrete analyses—SQL queries, metrics, thresholds—run directly against social‑media datasets.
  3. Evidence decomposition: Findings are broken into atomic evidence units, each representing a single, testable claim.
  4. Statistical verification: Where labels exist, each atomic claim is validated using effect sizes, odds ratios, and significance tests.
  5. Human evaluation: Experts review statistically supported claims for contextual and operational relevance.

Two design choices are especially notable.

First, deferred anomaly detection. The system avoids generic “something looks weird” signals. Every detected pattern must map to a specific DISARM technique. This forces interpretability by construction.

Second, full‑history feedback. Each iteration inherits all prior findings, allowing the agent to shift from exploration to exploitation—deepening promising techniques rather than resetting context every cycle.

Findings — How well does it work?

The pipeline was evaluated on two real‑world datasets:

  • A Chinese state‑linked campaign on X targeting Guo Wengui.
  • A Russian bot‑driven Telegram operation surrounding Moldova’s 2025 election.

Technique‑level performance

Dataset Techniques Tested Pass Rate
China / X 14 35.7%
Russia / Telegram 14 64.3%
Combined 28 50.0%

Half of all investigated techniques yielded statistically supported evidence—remarkable given the fully autonomous execution across 15 iterations per dataset.

What the agents found

  • In the China/X case, agents independently rediscovered a known signal: tightly clustered account creation dates with extremely high precision (0.97). This mirrors patterns previously identified through manual expert analysis.
  • In the Russia/Telegram case, the system surfaced more than 30 additional bot accounts that had not been flagged in the initial human‑led investigation—demonstrating genuine discovery rather than mere replication.

What they missed

The system struggled with temporal synchronization patterns (e.g., coordinated reposting within minutes). This exposes a current limitation: behavioral statistics are easier to automate than fine‑grained temporal choreography.

Implications — Why this matters beyond FIMI

This work is less about disinformation per se, and more about how frameworks become operational.

For military and policy organizations, the implications are clear:

  • Situational awareness improves when outputs are structured, continuous, and taxonomy‑aligned rather than anecdotal.
  • Human–machine teaming becomes viable when agents justify every conclusion within an explicit doctrinal framework, reducing automation bias.
  • Interoperability emerges naturally when outputs are standardized at the semantic level, not retrofitted after the fact.

The reported cost—roughly USD 11 for a full multi‑iteration investigation—underscores the strategic asymmetry this approach introduces. At that price point, not automating becomes the expensive choice.

Conclusion — From frameworks to force multipliers

The central contribution of this paper is conceptual as much as technical. It demonstrates that agentic AI can turn analytical frameworks into living systems—executing, validating, and refining them at machine speed while preserving human judgment where it matters most.

DISARM, operationalized this way, stops being a reference manual and starts behaving like infrastructure.

That shift—from static taxonomy to autonomous investigative process—is likely to define the next phase of AI‑augmented governance, far beyond information warfare.

Cognaptus: Automate the Present, Incubate the Future.