Opening — Why this matters now

Agentic AI is rapidly escaping the sandbox.

From copilots to autonomous workflows, we are now deploying systems that don’t just predict — they act. The problem? These systems are increasingly embedded in real-world environments where timing, safety, and consistency are not optional.

And yet, the underlying models — particularly large language models — are inherently non-deterministic. Same input, different output. Slight latency shifts, different behaviors. In a chatbot, this is charming. In a car, it’s fatal.

The paper fileciteturn0file0 tackles this uncomfortable truth head-on: how do we make agentic AI systems behave predictably when their core components are fundamentally unpredictable?

Their answer is not to “fix” the AI — but to redesign the system around it.


Background — Context and prior art

Cyber-Physical Systems (CPS) — think autonomous vehicles, industrial robots, smart infrastructure — rely heavily on determinism.

Determinism, in this context, is simple but unforgiving:

Given the same inputs, the system must produce the same outputs.

Why? Because determinism enables:

Capability Why It Matters
Repeatability You can test and validate safety-critical behavior
Debuggability Failures can be traced and reproduced
Composability Systems can be reliably integrated
Certification Regulators require predictable behavior

Now introduce three sources of chaos:

  1. Human behavior (inconsistent, emotional)
  2. Physical environment (dynamic, stochastic)
  3. LLM-based agents (probabilistic, latency-variable)

You don’t get a system. You get a negotiation.

Previous approaches tried to improve parts of the system:

  • Better models (accuracy)
  • Fine-tuning (alignment)
  • Formal verification (bounded guarantees)

But they largely ignored a structural issue:

Even a perfect model cannot guarantee system-level determinism if the execution architecture is non-deterministic.


Analysis — What the paper actually does

The authors propose a subtle but powerful shift:

Treat nondeterminism as input, not error.

1. System Formalization

They define the system behavior as:

$$ y(t) = F(x_i, i_h(t), i_c(t), i_a(t)) $$

Where:

  • $x_i$: initial system state
  • $i_h(t)$: human input
  • $i_c(t)$: environment/car input
  • $i_a(t)$: agent (LLM) input

The key idea is almost philosophical:

If you treat all variability as explicit inputs, the system itself can remain deterministic.

This reframes the problem from eliminating randomness to containing it.


2. Reactor Model of Computation (MoC)

Instead of loosely coupled services, the system is built using a reactor model, implemented via the Lingua Franca (LF) framework.

Core properties:

Feature Business Translation
Deterministic scheduling No race conditions between components
Port-based communication Clear data contracts between modules
Logical time Controlled timing instead of real-time chaos
Hierarchical composition Systems remain explainable and auditable

Think of it as replacing an improvisational jazz band with a tightly conducted orchestra.


3. The Agentic Driving Coach (Case Study)

The system is decomposed into four reactors:

Component Role
Driver Human behavior model
Car Physical dynamics
Environment External conditions
Coach AI agent (LLM + planner)

The Coach is where things get interesting:

  • LLM generates: CONTROL_SIGNAL | Instruction
  • Planner enforces modes: Monitoring → Warning → Actuate

This creates a controlled decision pipeline:

Mode Trigger Action
Monitoring Normal behavior No intervention
Warning Deviation Suggest correction
Actuate Safety breach Override control

This is not “AI autonomy.”

It’s AI under supervision with escalation protocols.


4. Containing LLM Uncertainty

The paper introduces three practical mechanisms:

a. Structured Prompting

Instead of free-form responses:


TOKEN | Message

With hard rules like:

  • If distance ≤ 25m and speed too high → ACTUATE
  • Else if deviation → WARNING

This reduces ambiguity and forces bounded outputs.

b. Deadline Enforcement

Each LLM call has a strict time budget:

Model Worst-case latency
1B 186 ms
8B 250 ms
70B 613 ms

If the model is late: → Fallback logic triggers immediately

This is critical:

A correct answer delivered late is equivalent to a wrong answer.

c. Logical Delays

Human reaction time (~500ms) and system delays are explicitly modeled.

This avoids the common fallacy of “instant AI decisions” in real-world systems.


Findings — Results with structure

The experiments (see figures on page 5 fileciteturn0file0) reveal a non-obvious trade-off:

Model Size vs System Safety

Model Latency Instruction Quality Outcome
1B Low Poor Unsafe (fails to stop)
8B Medium Good Acceptable
70B High Best Safest behavior

Two insights emerge:

  1. Smaller models are faster but dangerously inaccurate
  2. Larger models are safer but introduce timing risk

Which leads to a design paradox:

You cannot optimize for both intelligence and responsiveness without architectural intervention.


Determinism Achieved (With a Catch)

The system produces identical outputs when:

  • Inputs are identical
  • Timing is controlled
  • LLM outputs are bounded

But note the fine print:

Source of Variability How It’s Handled
LLM randomness Temperature = 0
Latency variation Deadlines + fallback
Human behavior Modeled as input stream

This is not pure determinism.

It’s engineered determinism — a constrained sandbox where chaos is allowed, but only within guardrails.


Implications — What this means for business

This paper quietly challenges how most companies are deploying AI today.

1. Prompt Engineering is Not Enough

Most teams focus on improving outputs.

This work shows:

The real risk lies in when and how outputs are delivered.

System architecture > model quality.


2. Agentic Systems Need Operating Systems

What Lingua Franca represents is essentially:

An OS for agent coordination

Expect a shift from:

  • “LLM as a tool” → “LLM as a component in a deterministic pipeline”

3. Safety = Latency × Accuracy

Traditional AI metrics ignore timing.

This paper implies a more realistic objective:

Metric Interpretation
Accuracy Is the decision correct?
Latency Is it delivered in time?
Determinism Is it repeatable?

All three must hold simultaneously.


4. The Rise of Hybrid Control Systems

The architecture blends:

  • Rule-based systems (fallback)
  • Probabilistic models (LLMs)
  • Deterministic orchestration (reactors)

This hybrid approach is likely to dominate safety-critical AI deployments.

Pure AI systems won’t pass regulatory scrutiny.


Conclusion — Control is the new intelligence

The industry has been obsessed with making AI smarter.

This paper asks a more uncomfortable question:

What if intelligence is not the bottleneck — control is?

By reframing nondeterminism as an input and enforcing deterministic orchestration around it, the authors demonstrate a path forward for deploying agentic AI in real-world systems without gambling on unpredictability.

It’s less glamorous than scaling parameters.

But it’s what makes AI deployable.

And in the end, deployability beats brilliance.

Cognaptus: Automate the Present, Incubate the Future.