Innovation, Agentified: How TRIZ Got Its AI Makeover

TL;DR for operators

A crane is a useful place to test agentic innovation because the problem is painfully concrete: move heavy loads faster, avoid dangerous swinging, prevent overheating, and do not accidentally turn productivity into an incident report. The paper behind TRIZ Agents uses exactly this kind of gantry-crane improvement problem to test whether a multi-agent LLM system can follow the TRIZ method and produce plausible engineering ideas.¹

The result is neither magic nor embarrassment. The agents completed the full TRIZ workflow, produced six step documents and a final report, identified several overlapping system components, matched the two physical contradictions from the reference case, and proposed sliding-mode control with an anti-swing trajectory, which aligns with one of the human study’s solutions.

The system also missed important pieces. Its function analysis was incomplete, it failed to identify the intelligent circuit breaker proposed in the human case study, and the Electrical Engineer agent was not involved in the final solution step. That last detail is not trivia. It shows that multi-agent systems do not merely need good “experts”; they need an orchestrator that reliably calls the right expert at the right time. Apparently even synthetic meetings can forget to invite electrical.

For operators, the business takeaway is narrow but useful: TRIZ Agents points toward AI-assisted innovation workshops, not autonomous invention departments. The near-term value is first-pass diagnosis, contradiction framing, solution scouting, and documentation. The hard boundary is that outputs still need human engineering review, domain validation, iteration, and cost management. One run used around 60–80 graph node calls and roughly 150,000–250,000 total tokens. That is not free brainstorming. It is meter-running brainstorming.

Start with the crane, not the agent diagram

The case study matters because it forces the agent system to deal with an actual engineering tension rather than a neat toy prompt.

A gantry crane has an obvious operational goal: move load quickly. It also has an obvious safety constraint: heavy loads swing, systems overheat, and operators can push equipment beyond sensible limits. That creates a classic TRIZ shape. Improving one desirable property worsens another. Speed helps throughput but harms stability. Load capacity helps productivity but increases safety and thermal risk.

This is the kind of problem TRIZ was built to formalise. TRIZ, the Theory of Inventive Problem Solving, asks innovators to abstract local engineering mess into reusable contradiction patterns, then use tools such as the 39 engineering parameters, the Contradiction Matrix, and the 40 Inventive Principles to generate solution directions. The attraction is structure. The annoyance is that using the structure properly requires expertise, patience, and a tolerance for methodological paperwork. A rare corporate combination, like clean CRM data.

TRIZ Agents tries to mechanise that process. It does not simply ask one LLM for ideas. It creates a simulated innovation team, gives each agent a role, adds tool access, and lets a Project Manager agent decide who should speak next. The paper’s central question is therefore not “can an LLM be creative?” That question has become too vague to be useful. The better question is: can a managed group of language agents follow a structured innovation method closely enough to produce useful intermediate reasoning?

The gantry crane case gives the paper its honest answer: partly.

The system is a TRIZ meeting simulator, not a synthetic inventor

TRIZ Agents is built as a supervised multi-agent workflow in LangGraph, using GPT-4o with temperature set to 0.5. The team contains a Project Manager plus domain and process agents: Mechanical Engineer, Electrical Engineer, Control Systems Engineer, Safety Engineer, TRIZ Specialist, Operations Specialist, and Documentation Specialist.

The Project Manager acts as the orchestrator. It sees the current conversation and the documentation from prior steps, then decides which agent should act next or whether the step should finish. Worker agents can use tools. All worker agents have web search access. The TRIZ Specialist also has TRIZ-specific tools: a 39-parameter feature list, a contradiction matrix tool, an inventive principles tool, and a TRIZ RAG tool.

That architecture is best understood as a meeting simulator with structured minutes.

Each TRIZ step is treated like a separate session. The agents do not carry full memory from one session to another. Instead, a Documentation Specialist records the previous step, and that documentation becomes the context for the next stage. This is a pragmatic design choice. It reduces context bloat and creates comparable artefacts, but it also means the workflow depends heavily on the quality of summaries. In human terms, the team forgets the discussion and trusts the minutes. Anyone who has survived project governance should feel a small chill.

The paper’s workflow is not an ablation study of agent architectures. It does not compare supervised teams against flat collaboration, hierarchical teams, or different prompt strategies. The architecture is an implementation vehicle. The main evidence comes from how the system behaves when run through the gantry-crane TRIZ case.

Paper component	Likely purpose	What it supports	What it does not prove
LangGraph supervised team design	Implementation detail	The authors can operationalise a controlled TRIZ workflow with role-specific agents and tools	That this architecture is better than other multi-agent designs
Gantry-crane comparison	Main evidence	The system can partially reproduce a documented human TRIZ process on one engineering case	General performance across engineering domains
Two assessed runs compiled	Exploratory sensitivity signal	Outputs are nondeterministic but recurrent patterns can be observed	Robustness in the statistical sense
Token and graph-call counts	Operational cost signal	The workflow is computationally non-trivial	Full economic viability in production
RAG underuse by TRIZ Specialist	Implementation limitation	Tool use depends strongly on prompts and agent autonomy	That RAG itself is ineffective

This distinction matters. The paper shows a working prototype and a case comparison. It does not show a benchmarked R&D automation platform. The difference is not pedantry. It is the difference between “useful pilot” and “please do not let this approve crane designs.”

Where the agents matched the human TRIZ workflow

The six-step comparison is the heart of the paper.

In the first step, defining the engineering system, TRIZ Agents identified most of the main crane components. These included parts such as gantry legs, wheels, railways, trolley, wire rope, motor, hoist, hook, railway beam, and bridge girder. The agents were less successful on supersystem elements, such as environmental factors and human workers.

That pattern continues in the function analysis. The agents identified some useful connections that overlapped with the reference study, such as trolley-to-hoist and wire-rope-to-hook. They also identified environmental conditions affecting the load, which lines up with the reference case’s attention to dust, humidity, and thermal factors. But the system missed six useful connections present in the original case study and missed the harmful connection between worker and system.

This is an important result because function analysis is not just a checklist. It is where the system begins to reason about how components influence one another. Missing human-system interaction is not a harmless omission when the original root causes include operator behaviour. In industrial settings, the human is often part of the machine, whether the diagram admits it or not.

In the cause-and-effect chain analysis, the agents recovered the two main root causes from the human study: overloading and rapid movement. They also generated five additional plausible causes that did not appear in the original case. The interpretation should be measured. Additional causes may reflect useful breadth, or they may reflect uncontrolled ideation. In early diagnosis, breadth can be valuable. In engineering validation, breadth is merely a queue of things someone competent must check.

The contradiction steps are stronger. TRIZ Agents identified engineering contradictions that did not exactly match the original paper’s parameters but were close: Speed versus Stability, and Load Capacity versus Safety. The authors note that these approximate the human study’s contradiction framing, even though the labels differ.

The most impressive overlap comes in the physical contradictions. The agents identified the same two contradictory needs as the original case: the crane must move quickly to improve productivity but slowly to prevent load swing; and it must lift heavy loads for efficiency but lighter loads to ensure safety and avoid overheating. This is the part of the workflow where the system seems to capture the TRIZ abstraction properly.

Then comes the solution step. TRIZ Agents proposed Sliding Mode Control with an anti-swing trajectory, matching one of the human study’s solutions. It also suggested thermal management enhancements, which are near the area of the reference study’s smart ventilation solution. But it missed the intelligent circuit breaker with sensors for current monitoring and regulation.

So the case evidence is not “AI invents like engineers.” It is more specific and more useful: a managed LLM-agent workflow can follow a formal innovation method well enough to recreate some intermediate reasoning and one key solution direction, while still losing important domain-specific options when orchestration or participation fails.

The missing circuit breaker is the business lesson

The most revealing failure is not that the system made a mistake. Systems make mistakes. Humans make mistakes. Committees make mistakes, then schedule a workshop to distribute ownership of them.

The revealing failure is that the Electrical Engineer agent was not involved in the final solution step. The system had an electrical expert available, but the Project Manager did not call that agent when generating solutions. The paper suggests this may explain why the intelligent circuit breaker idea was missed.

That is the orchestration problem in miniature.

In a multi-agent architecture, capability is not enough. Having an expert agent in the roster does not guarantee that its expertise enters the reasoning path. The Project Manager’s routing decisions shape the final solution space. If the orchestrator asks the Control Systems Engineer, Safety Engineer, and Operations Specialist to review solutions, but leaves out Electrical Engineering, the final answer inherits that omission.

For business use, this matters more than the brand name of the LLM. The bottleneck is not always model intelligence. Sometimes it is workflow governance. A production innovation agent would need explicit coverage rules, critique prompts, role participation checks, and probably stage gates that ask, “Which relevant discipline has not reviewed this yet?” Boring? Yes. Useful? Also yes. Real automation often looks suspiciously like better bureaucracy.

The same issue appears in tool use. The TRIZ Specialist had access to a TRIZ RAG tool, but rarely used it. The authors attribute this partly to prompting: the tool existed, but the prompt did not explicitly require or strongly encourage its use at each generation. The agent therefore relied on internal model knowledge or web search instead.

This is a familiar agentic-AI problem. Autonomy sounds elegant until the agent autonomously skips the thing you built the system to rely on. Tool access is not tool discipline. For enterprise use, the difference is expensive.

The useful unit is not the answer; it is the structured trail

The system’s outputs are valuable less because of the final solution and more because of the intermediate artefacts.

A single LLM answer to “how should we improve a gantry crane?” would be difficult to audit. TRIZ Agents instead produces step documents: system definition, function analysis, cause-effect chain, engineering contradictions, physical contradictions, and solutions. That creates a trail of reasoning. The trail can be inspected, challenged, corrected, and reused.

That is where the business case begins.

In many organisations, innovation work fails before “innovation” even starts. Teams do not share a problem frame. Domain experts talk past one another. Documentation is thin. Contradictions remain implicit. Brainstorming jumps straight to familiar solutions because everyone is tired and the whiteboard marker is dying heroically.

A TRIZ-agent workflow can help by forcing a sequence:

Define the system.
Map functions and interactions.
Identify root causes.
Convert messy tensions into TRIZ contradictions.
Translate contradictions into solution directions.
Document the result.

The promise is not that the AI team knows the correct solution. The promise is that it can produce a structured first draft of the innovation conversation.

That has operational value in at least four settings.

Business use	What the paper directly supports	Cognaptus inference	Boundary
R&D workshop preparation	Agents can generate stepwise TRIZ artefacts from a problem description	Teams could use the system to prepare discussion material before expert workshops	Experts must verify components, causes, and feasibility
Early-stage problem diagnosis	Agents recovered major root causes in the crane case	Useful for surfacing candidate causal chains quickly	Additional causes may be plausible but unvalidated
Contradiction mapping	Agents matched the physical contradictions exactly	Strong fit for converting trade-offs into structured innovation prompts	Engineering contradiction labels may be approximate
Solution scouting	Agents matched one solution direction and approximated another	Useful as a solution-menu generator	Missing disciplinary participation can remove critical options
Documentation	The workflow produces step documents and a final report	Useful for knowledge capture and auditability	Summaries can compress away important reasoning

This is why the article should not be read as a story about replacing experts. It is a story about reducing the blank-page cost of structured technical ideation. That is less cinematic, but far more likely to survive procurement.

The cost signal is not a footnote

The paper reports that one runtime involved around 60–80 graph node calls and approximately 150,000–250,000 total tokens. This is not presented as a formal cost analysis, but operators should pay attention anyway.

Agentic workflows are often sold conceptually as “just let agents collaborate.” In practice, collaboration means repeated prompts, tool calls, summaries, routing decisions, and output generation. The bill grows because the workflow is the product. Every meeting has minutes. Every minute has tokens. Every token has a price. Congratulations: the enterprise meeting has been successfully digitised.

The implication is not that such systems are too expensive. The implication is that cost control must be designed into the workflow early. A production system would need limits on graph iterations, role participation policies, retrieval rules, summary compression, caching, and escalation criteria. It would also need to decide when a cheap single-agent pass is sufficient and when a full TRIZ-agent workflow is justified.

For a high-value engineering issue, 250,000 tokens may be trivial if it saves expert time or reveals a better design path. For casual ideation, it may be theatrical overkill. The paper does not resolve that ROI question. It gives enough operational detail to make the question unavoidable.

The current system has no rehearsal loop

The paper’s most serious limitation is the absence of a feedback loop.

The Project Manager moves the workflow forward, but it does not direct the team to revisit earlier steps, refine weak analysis, challenge assumptions, or try alternative contradiction framings. Once a step is documented, the system proceeds. The authors explicitly identify this as a substantial limitation because real problem-solving is iterative.

This limitation changes how the system should be used.

Without feedback, the workflow behaves like a first-pass structured ideation engine. It can generate a coherent chain, but it cannot reliably repair its own weak links. If function analysis misses worker-system interaction, later steps may inherit that blind spot. If the wrong agents are involved in solution generation, the final proposals narrow prematurely.

The paper also notes context-window pressure. The six-step workflow already uses documentation summaries to carry information forward. Larger workflows could exceed manageable context, creating a need for long-term memory or better retrieval of prior step artefacts.

Finally, the results are nondeterministic. The authors generalise from the majority of runs and compile two selected runs for assessment. That is acceptable for an exploratory prototype, but it is not the same as robust evaluation. There are no ablations showing whether the Project Manager, RAG tool, contradiction matrix, or specific agent roles each improve outcomes. There is no multi-case benchmark. There is no blinded expert scoring of solution quality. The paper is useful, but it is not a certification exam for AI-driven innovation.

That boundary should be kept clean. A system can be promising and not yet production-safe. Adults can hold both thoughts at once.

What should be built next

The next version of TRIZ Agents, or any enterprise cousin of it, should not merely add more agents. More agents can create more noise unless orchestration improves.

The immediate design priorities are straightforward:

Needed improvement	Why it matters
Explicit feedback loops	Weak steps need revision before they contaminate later reasoning
Role coverage checks	Relevant domain agents should not be silently excluded from solution stages
Mandatory tool-use policies	RAG and TRIZ tools should be used when methodological grounding is required
Critique agents or review phases	Outputs need adversarial checking, not just polite agreement
Multi-case evaluation	One crane case cannot establish general reliability
Cost controls	Iteration and token budgets must be managed as product constraints
Expert validation workflow	Engineering outputs need human review before design decisions

The interesting point is that these improvements are not glamorous. They are governance features. In agentic AI, governance is often treated as the thing added after the demo. This paper suggests it belongs inside the architecture. The “intelligence” of the system is partly the intelligence of its routing, review, memory, and stopping rules.

That is a useful corrective to agent hype. The future of AI-assisted innovation may not look like a swarm of brilliant synthetic inventors. It may look like a disciplined workflow engine that knows which expert to call, which tool to use, when to challenge an answer, and when to stop talking.

Frankly, that would already be progress.

The real makeover is methodological, not magical

TRIZ Agents gives TRIZ an AI makeover, but the makeover is not that a machine suddenly becomes an inventor. The better reading is that a structured innovation method becomes executable as an agent workflow.

That distinction matters.

If the goal is autonomous engineering creativity, the paper is early and incomplete. The system misses important interactions, skips useful tools, depends heavily on prompts, lacks feedback, and has only one main case comparison. It is not ready to replace an R&D team, and anyone trying to use it that way deserves the procurement meeting they will eventually get.

If the goal is structured ideation support, the paper is much more persuasive. It shows that agent teams can follow a TRIZ workflow, produce inspectable intermediate artefacts, recover meaningful contradictions, and generate at least one solution direction that overlaps with a human TRIZ case study.

The practical future is therefore not “AI invents for us.” It is “AI helps us think in a disciplined sequence before the expensive humans enter the room.” That may sound less spectacular. It is also the version more likely to work.

Cognaptus: Automate the Present, Incubate the Future.

Kamil Szczepanik and Jarosław A. Chudziak, “TRIZ Agents: A Multi-Agent LLM Approach for TRIZ-Based Innovation,” arXiv:2506.18783, 2025, https://arxiv.org/abs/2506.18783. ↩︎

TL;DR for operators#

Start with the crane, not the agent diagram#

The system is a TRIZ meeting simulator, not a synthetic inventor#

Where the agents matched the human TRIZ workflow#

The missing circuit breaker is the business lesson#

The useful unit is not the answer; it is the structured trail#

The cost signal is not a footnote#

The current system has no rehearsal loop#

What should be built next#

The real makeover is methodological, not magical#