A requirement can survive three meetings, two approvals, and a legal review while still meaning different things to everyone who reads it.
That is not usually because anyone is careless. Natural language is simply very good at sounding settled before its meaning is settled. Words such as “after,” “until,” “immediately,” and “within” feel precise in conversation. In software requirements, they can quietly conceal incompatible assumptions about timing, cancellation, and acceptable system behavior.
Formal logic solves the ambiguity problem, but creates a communication problem. A temporal-logic formula may tell a verification tool exactly what must happen while telling a product manager almost nothing before lunch.
The paper Developing Controlled Natural Language for Formal Specification Patterns Using AI Assistants proposes a method for building a controlled natural language from formal requirement patterns.1 Its central idea is sensible: begin with semantics that are already formal, then use an AI assistant to help produce readable sentence patterns that preserve those semantics.
The interesting part is not simply that an LLM can rewrite formal requirements. We already know LLMs can rewrite things, sometimes even the things they were asked to rewrite.
The useful result comes from comparing three different ways of constraining the assistant:
- Give it the original natural-language pattern and the values of constant attributes.
- Add the formal logic formula.
- Instead of adding the formula, explain the intuitive meaning of the constant attribute values.
Those approaches produce three different failure profiles. The comparison reveals a broader lesson for business uses of generative AI: more formal information does not automatically produce a better human-facing answer. The model needs constraints expressed at the level of the task it is expected to perform.
The requirement has to serve two readers
A controlled natural language, or CNL, is a restricted form of natural language with defined syntax and vocabulary. Its purpose is to remain readable to people while allowing each valid sentence to be translated unambiguously into a formal representation.
That sounds like a modest compromise. In practice, it creates a difficult design problem.
A requirements language must satisfy two audiences:
| Audience | What it needs |
|---|---|
| Engineers, analysts, and domain specialists | Sentences that are understandable and reasonably natural |
| Verification tools and formal-methods specialists | Meanings that are precise, compact, and mechanically checkable |
Languages designed from ordinary usage can become expressive but formally cumbersome. Languages designed around a small collection of logical patterns can remain easy to verify but too limited for practical requirements.
The paper chooses a semantics-first route. It begins with a formal specification pattern whose attributes already have defined meanings. The natural-language system is then constructed around that pattern.
This direction matters. The language is not written first and mapped to logic later. Formal semantics are the starting constraint.
The authors demonstrate the method using Event-Driven Temporal Logic, or EDTL. An EDTL requirement contains six attributes:
- trigger: the event that activates the requirement;
- invariant: a condition that remains valid during the relevant period;
- final: an event that begins the wait for a reaction;
- delay: the timing condition for the reaction;
- reaction: the expected event;
- release: an event that cancels the requirement.
Each attribute may be variable or fixed as logically true or false. When some attributes become constants, the full formal requirement can often be simplified. A readable sentence describing that requirement should simplify as well.
That is where the AI assistant enters.
The method starts with one deliberately overloaded sentence
The paper proposes three construction stages.
First, a human writes a base natural-language pattern containing slots for every attribute of the formal requirement. For EDTL, the authors use:
After ‘trigger’, ‘invariant’ is valid until either ‘release’ or ‘reaction’, and ‘reaction’ must occur within ‘delay’ from ‘final’.
It is not elegant. It is not supposed to be.
The base pattern is a semantic inventory expressed as a sentence. Every attribute is visible, so the pattern can later be simplified when particular attributes become constant.
Second, an AI assistant generates derived sentence patterns for different combinations of constant and variable attributes. Because each of six EDTL attributes can be true, false, or variable, the naive space contains:
possible combinations.
The authors note that formal semantic classification can reduce these to 32 canonical EDTL combinations because many different attribute assignments produce equivalent formulas. That reduction is important operationally: it prevents the language-development process from wasting effort on logically duplicate cases.
Third, the recurring grammatical structures in the generated sentences are used to construct a CNL grammar. The paper presents a partial EDTL grammar with rules such as:
Req := After
, <body_trig> | is valid <body_inv>
The grammar is not the paper’s strongest evidence. It is the downstream artifact. The more revealing question is how the authors obtain sentence patterns suitable for that grammar without allowing the AI assistant to either improvise the meaning or bury it beneath formal-language debris.
Comparison one: a natural-language prompt can sound right while changing the rule
The basic prompt gives the assistant:
- the original full sentence pattern;
- the constant values assigned to selected attributes;
- a request to reformulate the sentence and explain why the result is correct.
Consider a case where release is false, delay is true, final is true, and invariant is true. The resulting formal meaning simplifies to:
In plain language: whenever the trigger occurs, the reaction must occur.
Using only the basic prompt, the assistant instead produces a sentence referring to a “condition,” a reaction, and a “specified time limit.” Those additions sound plausible because they resemble the original template. They do not correspond to the simplified requirement.
This is a familiar LLM failure, but the paper shows it in a particularly instructive form. The model is not hallucinating a random fact. It is preserving linguistic material that should have disappeared after the formal simplification.
The sentence remains close to the source wording while drifting away from the source meaning.
That distinction matters in requirements work. Ordinary text-generation evaluation often rewards similarity, fluency, and topical consistency. Here, those qualities can disguise an error. A sentence may look like a faithful reformulation precisely because it retains concepts that no longer belong in the requirement.
The basic prompt therefore gives the model too much freedom in deciding what constant attributes mean for the sentence. It can remove an attribute name yet preserve an incorrect linguistic substitute.
What the basic prompt supports
The basic prompt is useful for generating candidate phrasings and observing how the assistant interprets the pattern. Asking the model to explain its answer also gives reviewers evidence about the route by which it reached a wrong conclusion.
It does not reliably enforce formal equivalence.
That makes it a drafting instrument, not an assurance mechanism.
Comparison two: adding the formula restores meaning but damages the sentence
The obvious correction is to include the simplified Linear Temporal Logic formula in the prompt.
This works in an important sense. When given the formula $G(\text{trigger}\rightarrow\text{reaction})$, the assistant produces the concise sentence:
After ‘trigger’, ‘reaction’ must occur.
The semantic error disappears.
It would be tempting to stop here and declare formal grounding victorious. The paper does not.
For requirements with more non-constant attributes, the assistant begins to behave less like an editor of the original language pattern and more like a translator of the formula. The resulting sentences can remain semantically accurate while becoming cumbersome and difficult to read.
For one tested combination, the formula-guided answer says that the invariant must hold and the delay must not occur until either release or reaction occurs. The sentence tracks the logical structure, but it exposes that structure in language poorly suited to ordinary requirements work.
This is the likely reader misconception the paper usefully corrects: providing the formal formula does not automatically produce the best formal-language interface for humans.
The formula contains the right information. It does not contain instructions about which parts of that information should remain implicit in readable prose.
A model asked to honor the formula may become conservative. Rather than risk dropping a logical condition, it verbalizes the machinery. Semantic safety improves, but linguistic usability deteriorates.
| Prompt configuration | Typical strength | Typical failure |
|---|---|---|
| Base pattern and constant values | Preserves the shape of ordinary language | Can retain concepts that should disappear and change the requirement’s meaning |
| Base pattern, constant values, and formal formula | More reliably follows the simplified semantics | Can translate logical structure too literally and produce cumbersome sentences |
| Base pattern, constant values, and intuitive explanations | Usually balances meaning and readability | Still struggles when the attribute combination itself lacks an intuitive human interpretation |
The comparison is more useful than a generic conclusion that “prompt engineering matters.” It identifies which type of prompt information helps which objective.
The formula is good at constraining truth conditions. It is less good at telling the model how a person should naturally express those truth conditions.
Comparison three: explain the constants, not the entire formula
The paper’s preferred configuration returns to the base natural-language prompt but replaces the LTL formula with explanations of what constant attribute values mean.
For example, the assistant is told that:
- if the invariant is true, the statement does not depend on the invariant;
- if final is true, final is “now”;
- if final is false, final never happens;
- if delay is false, the delay is infinite;
- if delay is true, there is no delay;
- if reaction is false, the requirement does not wait for the reaction;
- if reaction is true, the statement does not depend on the reaction.
These explanations do not ask the model to reconstruct the formal formula in prose. They tell it how each constant should affect the language.
That is a subtle but consequential change.
The model receives semantic guidance in concepts that directly correspond to editing decisions:
- remove this dependency;
- express this event as immediate;
- treat this event as never occurring;
- remove this timing restriction.
The paper reports that this configuration produces sentences that are, in most cases, both semantically correct and syntactically readable.
For the simplified trigger–reaction case, it produces:
After ‘trigger’, ‘reaction’ occurs now.
For a case where final is true, it retains the broader EDTL structure but expresses the timing reference as “now,” rather than mechanically verbalizing the underlying formula.
The method works because it gives the model a semantic interface rather than merely a semantic object.
The formula defines what the requirement means. The attribute explanations define how changes in that meaning should affect the wording.
This distinction is relevant far beyond controlled natural language. In many enterprise AI tasks, organizations provide models with authoritative source material and assume that authority alone will discipline the output. Often, the missing component is an explicit transformation policy.
A regulation tells a model what is legally written. It does not automatically tell the model how to produce an operational checklist.
A financial model tells it how a value was calculated. It does not automatically tell it which assumptions should be surfaced in an executive explanation.
A software policy tells it what actions are allowed. It does not automatically tell it how to phrase an instruction for a user facing an exception.
The useful prompt is frequently not “Here is the formal truth.” It is “Here is how each formal condition changes the answer you must produce.”
The best prompt also exposes requirements that should not become sentences
The attribute-explanation approach does not solve every combination.
The authors observe that certain combinations, particularly those involving a constant reaction value, lead the assistant to produce sentences with broader semantics than the formulas. Their interpretation is that these combinations do not have a clear intuitive meaning in natural language and would also be confusing to human users.
The proposed response is not to keep prompting until the model produces a polished sentence. It is to exclude those combinations from the CNL and treat them as possible errors when engineers specify requirements in tabular form.
This may be the paper’s most practically mature decision.
Generative AI systems are often expected to convert every valid input into a fluent output. Fluency then conceals cases where the input itself is conceptually awkward or poorly formed. The paper instead uses failed reformulation as a diagnostic signal.
A requirement combination can be formally representable without being suitable for human-facing specification.
That creates two possible outcomes:
-
The combination has an intuitive interpretation. Generate a readable pattern and include it in the controlled language.
-
The combination can only be understood by inspecting the formula. Do not manufacture a natural-sounding sentence. Flag the combination for engineering review.
This is an important boundary between assistance and concealment. A model that turns every strange requirement into smooth prose may improve document quality while reducing requirement quality.
Occasionally, the correct language-generation output is an objection.
The experiment compares prompt strategies, not model performance
The tables in the paper should be interpreted carefully.
They are not benchmark results in the usual machine-learning sense. The authors do not report aggregate accuracy, readability scores, repeated trials, inter-rater agreement, or statistical comparisons. They show selected outputs from Google’s AI mode under different prompt configurations and analyze the resulting semantic and linguistic behavior.
The comparisons function as main qualitative evidence for the proposed construction process:
| Test | Likely purpose | What it supports | What it does not establish |
|---|---|---|---|
| Basic prompt versus formula-guided prompt | Compare semantic drift with formula-driven literalism | The two configurations have different failure modes | Their average reliability across requirements or models |
| Formula-guided prompt versus attribute-explanation prompt | Identify a better way to balance semantics and readability | Intuitive attribute explanations are more suitable for the tested CNL-construction task | That explanations will outperform formulas in every formal-language task |
| Identification of unclear constant combinations | Explore whether generation failures reveal problematic requirement forms | Some formally valid combinations may be unsuitable for natural-language specification | A validated automatic method for detecting all bad requirements |
| Partial EDTL grammar | Demonstrate the final construction stage | Generated patterns can be organized into a CNL grammar | A complete, production-ready EDTL language |
The paper also discusses the Simplified Universal Pattern used in an industrial tool context and argues that its attributes have properties compatible with the proposed method. This is an applicability argument, not a second empirical validation. The actual generation experiment is conducted on EDTL.
That distinction prevents a small demonstration from acquiring a suspiciously large résumé.
The business value is controlled transformation, not automatic specification
The paper directly shows a method for developing a controlled natural language from formal specification patterns. It does not show an autonomous system that can safely generate and approve production requirements.
For organizations working with safety-critical or verification-sensitive systems, the practical value lies in reorganizing the requirements workflow.
A possible operational sequence is:
Formal requirement pattern
↓
Canonical attribute combinations
↓
Human-defined explanations of constant values
↓
LLM-generated candidate sentence patterns
↓
Semantic and linguistic review
↓
Approved controlled-language grammar
↓
Readable requirements linked to formal semantics
This workflow assigns the AI assistant a constrained design role. It helps produce candidate language patterns and identify awkward combinations. It does not decide what the system should do, define the formal semantics, or certify that the final language is correct.
What the paper directly supports
The method can help teams:
- derive multiple readable requirement patterns from one formally defined base pattern;
- maintain a relationship between human-readable requirements and formal semantics;
- use canonical semantic classes to reduce redundant language-generation work;
- identify attribute combinations that resist intuitive expression;
- construct a grammar from recurring generated sentence structures.
What Cognaptus infers for business use
With appropriate engineering review, the method could reduce the manual effort required to design controlled requirement languages for:
- industrial control systems;
- transport and autonomous-system software;
- medical-device behavior;
- regulated workflow automation;
- policy-driven software agents;
- other systems where stated intent must remain traceable to machine-checkable rules.
The most credible ROI pathway is not “LLMs write requirements faster.” That claim would be easy to make and difficult to trust.
The stronger pathway is cheaper semantic alignment. A controlled-language layer may reduce the effort required to move between domain discussion, formal verification, testing, and audit documentation. It may also expose defective requirement combinations earlier, before they become implementation defects with a project plan attached.
What remains uncertain
The paper does not establish:
- how accurately the method performs across a complete requirement language;
- whether different LLMs generate consistent patterns;
- whether repeated runs produce stable wording;
- how much expert review remains necessary;
- whether engineers understand the generated CNL more accurately than existing alternatives;
- whether the method reduces project cost or defect rates;
- how the approach performs when formal patterns are substantially more complex than EDTL.
Those are not decorative limitations. They determine whether the method remains a useful language-design aid or becomes a dependable component of an industrial toolchain.
A deployable version needs a stricter review loop
A production implementation should not simply submit prompts to an LLM and store the answers as grammar rules.
The paper’s own examples show why. The basic prompt can produce fluent semantic errors. The formula-guided prompt can produce correct but unusable constructions. The attribute-explanation prompt performs better in the demonstrated cases, but still produces broader meanings for certain combinations.
A practical system therefore needs at least four review layers.
| Review layer | Question |
|---|---|
| Formal equivalence | Does the candidate sentence correspond to the intended simplified formula? |
| Linguistic usability | Can domain users understand the sentence without reconstructing the formula? |
| Combination validity | Should this attribute combination be expressible in the CNL at all? |
| Grammar consistency | Does the approved sentence fit the controlled vocabulary and syntax? |
Some parts could eventually be automated. Candidate sentences could be parsed back into formal representations and compared with the source formula. Multiple models or deterministic grammar checks could identify unstable outputs. Human reviewers could focus on cases where semantic checks disagree or wording remains difficult.
The paper does not implement that full pipeline. It provides a useful design principle for building one: the LLM should operate between formal simplification and controlled grammar construction, surrounded by constraints that reflect both tasks.
Formal logic does not teach language by itself
The paper begins from a semantics-first position, but its experiment reveals that semantics alone are insufficient.
A formal formula can prevent certain kinds of error. It cannot decide how much of its structure a readable sentence should expose. A natural-language template can preserve familiar phrasing. It cannot reliably determine which concepts should disappear after simplification.
The best-performing configuration in the paper inserts a carefully designed middle layer: intuitive explanations that translate constant logical values into linguistic consequences.
That is the broader lesson.
When organizations use generative AI to mediate between technical truth and human communication, they should not expect either side to discipline the other automatically. Formal inputs can still yield poor explanations. Fluent outputs can still corrupt formal meaning.
The AI assistant becomes useful when the transformation rules are made explicit: what must remain, what must disappear, what should become immediate, and what should be rejected as unintuitive.
In other words, the model does not write the rules alone. It writes within rules that explain how rules become language.
That arrangement is less magical than autonomous requirements engineering.
It is also considerably more credible.
Cognaptus: Automate the Present, Incubate the Future.
-
Natalia Garanina, Vladimir Zyubin, and Igor Anureev, “Developing controlled natural language for formal specification patterns using AI assistants,” arXiv:2512.24159, https://arxiv.org/pdf/2512.24159 ↩︎