TL;DR for operators
Most enterprise RAG systems still behave like diligent interns with a search box: they retrieve a handful of plausible snippets, hand them to a language model, and hope the synthesis does not quietly forget half the question. That works for narrow Q&A. It fails when the user asks for a relationship chain, a complete list, or a decision-ready map of who did what, funded by whom, connected to which topic.
INRAExplorer, introduced in a paper by Jean Lelong, Adnane Errazine, and Annabelle Blangero, is not trying to win a leaderboard for “best RAG answer.” It is more interesting than that, which is inconvenient for people who prefer simple rankings.1 It is a concrete architecture for agentic, knowledge-graph-enhanced RAG over INRAE’s scientific corpus. The system combines an LLM agent, a Neo4j knowledge graph, hybrid dense/BM25 publication search, thesaurus-linked concepts, and specialised tools such as expert identification.
The practical message is straightforward: serious enterprise retrieval is moving from top-k snippet selection to governed knowledge operations. That means curated entity graphs, controlled vocabularies, reusable tools, explicit graph traversal, and answer synthesis that can show the path from query to evidence.
The boundary is just as important. INRAExplorer is mainly an architecture and application paper. Its scenarios illustrate multi-hop reasoning and modular tool use; they are not a rigorous empirical benchmark. So the right business takeaway is not “this system proves GraphRAG wins.” The right takeaway is: “this is what your RAG architecture starts to look like when users stop asking toy questions.”
The real problem is not retrieval. It is question shape.
Classical RAG is built around a comforting assumption: the answer is probably hiding inside a few relevant chunks. Retrieve the top matches, stuff them into the context window, let the model compose an answer, and call it grounded. For many tasks, that is perfectly serviceable. It can answer policy lookups, summarise a document, or explain a known concept without embarrassing everyone in the room.
But many institutional questions are not shaped like snippets. They are shaped like paths.
A user does not merely ask, “What papers discuss climate change adaptation?” They ask something closer to: which INRAE authors published on climate change adaptation, which projects funded those publications, and what other topics those projects connect to. That is not a semantic similarity problem. It is a traversal problem.
The difference matters. A vector search can find text that sounds relevant to “climate change adaptation strategies.” It cannot, by itself, guarantee that the system has found all matching publications, connected them to the correct authors, traced funding relationships, and then expanded into project-related topics. The query requires structured entities and relationships: author → publication → project → concept. A bag of snippets is a poor substitute for a map.
INRAExplorer starts from that diagnosis. The paper positions conventional top-k RAG as useful but structurally limited for exhaustive, relational, multi-hop queries. Its proposed answer is not just “add a graph” in the vague conference-demo sense. It gives the language model agent several ways to interact with a hybrid knowledge base, then lets the agent plan, query, traverse, and synthesise.
That is the mechanism worth understanding.
INRAExplorer turns RAG into a tool-using knowledge operation
The system is built around a hybrid knowledge base covering INRAE scientific output from January 2019 to August 2024, restricted to Open Access documents. The authors construct it from publication metadata joined from HAL and OpenAire, with additional enrichment and validation from sources such as BBI, ScanR, project information, and dataset repositories. Full-text Open Access PDFs are processed with GROBID to extract structured sections such as titles, abstracts, keywords, introductions, and conclusions.
That ingestion pipeline matters because agentic RAG only looks magical when the underlying data work has already been done. The agent can reason over structure only because someone first created structure. Tiresome, yes. Also known as “the part of AI strategy people try to skip.”
The resulting knowledge graph contains 417,030 nodes and more than one million relationships. The paper reports the node distribution as follows:
| Node type | Count | Share | Operational role |
|---|---|---|---|
| Author | 233,728 | 56.0% | Researchers and publication authors |
| Keyword | 96,588 | 23.2% | Author-declared publication keywords |
| Publication | 38,791 | 9.3% | Scientific articles and related outputs |
| Software | 21,617 | 5.2% | Software developed or used in research |
| Concept | 13,591 | 3.3% | Controlled concepts from the INRAE thesaurus |
| Journal | 5,563 | 1.3% | Publication venues |
| Project | 3,999 | 1.0% | Funded research projects |
| Domain | 2,595 | 0.6% | Higher-level thesaurus domains |
| ResearchUnit | 299 | 0.1% | INRAE laboratories and units |
| Dataset | 240 | 0.1% | Research datasets |
| Region | 19 | 0.0% | Regions linked to research units |
The important detail is not simply graph size. It is graph shape. The system does not only index publications. It models authors, publications, keywords, concepts, journals, projects, research units, datasets, software, domains, and regions as entities that can be traversed.
The INRAE thesaurus is especially important. Domain terms and more specific concept terms are integrated into the graph, and publications are linked to concepts through exact matches in selected text sections. This gives the system a controlled vocabulary for domain exploration. In enterprise terms, this is the difference between asking an LLM to “understand our jargon” and actually giving it a structured semantic layer.
Alongside the graph, INRAExplorer uses a vector database for publication chunks. It computes dense vectors using Jina v3 embeddings and sparse vectors using BM25, then applies reranking to refine results. This is not a rejection of classical retrieval. It is a containment strategy. Vector and keyword search become entry-point tools rather than the entire architecture.
The agent does not just answer. It chooses how to look.
At the centre of INRAExplorer is an LLM-based agent using the open-weight model deepseek-r1-0528. The agent’s job is to understand the query, decompose it when necessary, choose tools, execute retrieval steps, and synthesise the gathered evidence into a coherent response.
The toolkit is deliberately modular:
| Tool | What it does | Why it matters |
|---|---|---|
SearchGraph |
Sends Cypher queries to the Neo4j knowledge graph | Enables structured retrieval, exhaustive lists, and relationship traversal |
SearchPublications |
Performs hybrid semantic and keyword search over publication texts | Finds relevant entry points into the corpus before graph expansion |
SearchConceptsKeywords |
Searches thesaurus concepts and author keywords | Bridges user language with controlled domain vocabulary |
IdentifyExperts |
Encapsulates a predefined expert-ranking workflow | Makes a complex task more reproducible and less dependent on improvised agent behaviour |
This design is the paper’s main contribution. INRAExplorer is not merely retrieving more information. It separates different retrieval behaviours into tools with different responsibilities.
That separation is what makes the architecture operationally interesting. In many enterprise RAG systems, the model receives a user query, a search tool, and a vague instruction to “answer accurately.” INRAExplorer instead gives the model distinct capabilities: find documents, map concepts, query structured relationships, or run a controlled expert-identification process.
This is a quiet but important shift. The agent is not the knowledge base. It is the orchestrator of knowledge operations. The knowledge graph stores relationships. Hybrid search finds textual entry points. The thesaurus aligns vocabulary. Specialised tools enforce repeatable workflows. The LLM plans and synthesises, but it is not asked to hallucinate institutional structure from prose.
That is how agentic RAG becomes less theatrical and more useful.
Multi-hop reasoning is graph traversal with a narrator
The paper’s first illustrative scenario asks the system to find INRAE authors who published on “climate change adaptation strategies,” identify the projects funding those publications, and list other key topics connected to those projects.
The likely purpose of this scenario is implementation demonstration, not main quantitative evidence. It shows how the architecture handles a query that requires multiple relationship hops. It does not prove that the system is better than all alternatives across a benchmark suite.
Still, the scenario is valuable because it reveals the operating logic.
The agent first uses SearchPublications or SearchConceptsKeywords to find relevant publications and concepts. This grounds the search in the corpus. Then it uses SearchGraph to connect the publications to Project nodes through funding relationships. Finally, it traverses from projects to related Concept nodes, broadening the answer from “papers about X” to “the institutional research network surrounding X.”
The final answer can show a chain: authors published relevant papers; those papers were funded by certain projects; those projects connect to additional research topics. The structure of the answer mirrors the structure of the knowledge graph.
This is where classical RAG looks thin. It can often produce a plausible paragraph about climate adaptation. It may even cite some relevant documents. But plausibility is not the same as exhaustive relational retrieval. If the business task is “prepare a funding intelligence map,” “identify internal expertise,” or “trace compliance dependencies,” then a fluent paragraph is not enough. The answer has to preserve the links.
In business language, INRAExplorer’s multi-hop scenario supports a shift from answer generation to knowledge navigation. The output is useful because it contains not just what was found, but how the parts connect.
Expert identification is where modularity becomes governance
The second scenario focuses on identifying leading INRAE experts on a topic such as zoonoses. Again, this is an illustrative capability rather than a formal evaluation result. Its purpose is to show how high-level tools can encapsulate domain logic.
Without a specialised tool, the agent would need to invent a plan: search for papers, identify authors, judge relevance, maybe consider citations, maybe consider recency, maybe forget half of that because the prompt got long. Charming, in the way a spreadsheet maintained by seven departments is charming.
INRAExplorer instead uses an IdentifyExperts tool. This tool runs a predefined workflow: it searches for relevant publications, extracts authors through the graph, and calculates a composite expertise score using factors such as article relevance, number of articles in the top 10% of results, total relevant publications, citation counts, period of activity, and recency of latest publication. The agent then presents the structured result.
The business value is not that this scoring formula is universally correct. It is not. Expert ranking is context-sensitive, politically delicate, and easy to abuse if treated as objective truth. The value is that the scoring logic is explicit enough to be inspected, adjusted, and reused.
That is the governance lesson. Organisations should not ask general-purpose agents to improvise sensitive institutional judgments every time. They should wrap recurring, high-stakes workflows in tools whose assumptions are visible. The LLM can still orchestrate and explain. But the scoring rules, entity definitions, and retrieval paths should be controlled by the system, not rediscovered by the model on a Tuesday afternoon.
What the paper shows, what Cognaptus infers, and what remains unproven
The paper’s evidence is architectural and illustrative. That does not make it weak. It makes it a specific kind of contribution. The mistake would be to read it as a benchmark paper and then either overpraise it for results it does not claim or dismiss it for not doing a job it did not set out to do.
| Element in the paper | Likely purpose | What it supports | What it does not prove |
|---|---|---|---|
| Knowledge graph construction and node distribution | Implementation detail and system foundation | The system operates over a large, heterogeneous, structured corpus | That larger graphs automatically produce better answers |
| Hybrid dense/BM25 publication search | Retrieval architecture detail | The system uses both semantic and keyword entry points | That this hybrid setup is optimal for all corpora |
SearchGraph with Cypher over Neo4j |
Core mechanism | The agent can retrieve structured relationships and exhaustive entity sets | That the agent always writes perfect graph queries |
| Climate adaptation multi-hop scenario | Illustrative demonstration | The architecture can express author → publication → project → concept reasoning | Benchmark-level superiority over classical RAG |
IdentifyExperts workflow |
Modular tool demonstration | Recurring domain tasks can be wrapped in controlled, repeatable tools | That the specific expertise score is universally valid |
| Future evaluation discussion | Boundary-setting | The authors recognise the need for domain-specific gold standards | That such evaluation already exists |
This distinction matters for operators. If you are deciding whether to deploy an INRAExplorer-style architecture, the paper gives you design evidence, not procurement evidence. It tells you what components need to exist and how they can interact. It does not tell you expected cost reduction, error rate, user adoption, latency under load, or maintenance burden.
Cognaptus’ inference is that the architecture is directionally aligned with where enterprise RAG has to go. Once users ask multi-step institutional questions, snippet retrieval becomes a bottleneck. A graph-backed, tool-orchestrated system gives the organisation a better substrate for completeness, traceability, and control.
But that inference has boundaries. The paper does not establish that every enterprise needs a knowledge graph. It does not show that an agentic system is always worth the complexity. It does not compare against carefully engineered non-agentic baselines. And it does not solve the hardest operational problem: keeping the graph accurate as the organisation changes.
The ROI is not “better answers.” It is cheaper structured diagnosis.
The tempting sales pitch is that agentic GraphRAG produces better answers. That may be true, but it is too vague to be useful. Better for whom? On which questions? At what cost? With what failure modes? Hand-waving here is how pilots become expensive theatre.
The sharper business value is cheaper structured diagnosis.
In a research institution, structured diagnosis might mean finding who works on a topic, which projects fund related work, what datasets or software exist, and how research units connect. In a bank, the same pattern might map products, controls, policies, incidents, owners, and regulatory obligations. In a manufacturer, it might connect defects, suppliers, parts, plants, engineering changes, and warranty claims.
The pattern is stable: users ask questions that cut across entities and relationships. Classical RAG retrieves text. The organisation needs a navigable institutional model.
INRAExplorer points toward four operational design principles:
| Technical design choice | Operational consequence | ROI relevance |
|---|---|---|
| Curated entity graph | Queries can traverse people, documents, projects, topics, and assets | Reduces manual cross-referencing |
| Hybrid search as entry point | Users can begin with natural language even when exact entities are unknown | Lowers query friction |
| Agentic tool orchestration | Complex questions can be decomposed into retrieval steps | Supports higher-value analytical workflows |
| Specialised domain tools | Repeated tasks become controlled and inspectable | Improves consistency and governance |
The ROI case, then, is not merely “the chatbot is smarter.” It is that analysts spend less time assembling relationship maps by hand, and managers receive answers that are structured enough to support action.
That case still needs measurement. A serious deployment would track task completion time, recall of known entities, analyst correction rates, provenance quality, user trust, latency, and maintenance cost. INRAExplorer does not provide those metrics. It provides a plausible architecture for pursuing them.
The hard part is not the agent. It is the institutional substrate.
There is a useful irony in agentic RAG. The visible novelty is the agent. The durable value often comes from everything around it.
INRAExplorer depends on merged metadata sources, deduplication, full-text extraction, thesaurus integration, entity modelling, relationship creation, graph storage, vector indexing, reranking, and tool design. The LLM sits on top of this infrastructure. It looks like the star because it speaks. The graph does the quiet work because graphs, regrettably, do not give keynote demos.
For businesses, this should reset expectations. If an organisation wants INRAExplorer-style capabilities, it cannot simply plug an LLM into SharePoint and declare itself agentic. It needs to decide what entities matter, which relationships are trusted, how vocabularies are governed, which workflows deserve specialised tools, and how results will be evaluated.
That is not a reason to avoid the architecture. It is a reason to budget for the work that makes it real.
The best starting point is not “build the enterprise brain.” That phrase should be placed gently in a drawer and left there. The better starting point is a narrow, relationship-heavy workflow: expert finding, compliance mapping, project intelligence, customer issue diagnosis, supplier risk tracing, or research portfolio analysis. Build the graph around that workflow. Add hybrid retrieval for entry points. Add tools for the recurring tasks. Then evaluate against domain-specific gold standards.
That is how agentic RAG becomes an operating system for knowledge work rather than a prettier search bar.
The paper’s limitation is also its strategic honesty
The authors explicitly identify evaluation as future work. They argue that standard benchmarks do not capture the complexity of the scientific multi-hop queries relevant to their use case, and that meaningful assessment will require domain experts, realistic tasks, gold standards, and success criteria.
That is the correct limitation. It is also the right research direction.
Generic RAG benchmarks often reward systems for retrieving and summarising answer-bearing text. INRAExplorer is aimed at something messier: exhaustive retrieval, relationship traversal, modular task execution, and structured synthesis over real institutional data. Evaluating that requires knowing what counts as complete, correct, useful, and auditable in the domain.
The paper also mentions future work on specialising the core agent model, including approaches inspired by Reinforcement Learning from Verifiable Feedback. That is an exploratory extension, not a result. The idea is plausible: if the system can define verifiable feedback over graph-navigation tasks, smaller models might learn to use tools more reliably. But the paper does not demonstrate this yet.
So the disciplined reading is simple: INRAExplorer gives a credible architecture and concrete scenarios; it does not yet give a mature evaluation framework. For business adoption, that means the design is interesting, but any deployment should include its own domain evaluation before anyone starts printing “AI transformation” on lanyards.
From snippets to synthesis means changing the unit of retrieval
The most useful idea in INRAExplorer is not that it combines RAG and a knowledge graph. Many systems now claim some version of that. The useful idea is that the unit of retrieval changes.
Classical RAG retrieves chunks. INRAExplorer retrieves and composes relationships. A publication is not just a piece of text. It is linked to authors, journals, concepts, datasets, software, projects, research units, and domains. A user query can therefore be answered by moving through the institutional structure, not merely by scanning prose.
That is the difference between a document assistant and a knowledge assistant.
The paper’s scenarios are modest in the right way. They do not pretend to solve enterprise AI. They show what becomes possible when the system has a graph to traverse, tools to choose, and a language model to coordinate the process. The architecture is not magic. It is a disciplined assembly of retrieval, structure, orchestration, and synthesis.
For operators, that is the point. The next useful generation of RAG will not be defined by larger context windows alone. It will be defined by whether systems can answer questions whose shape resembles the organisation itself: tangled, relational, incomplete, and inconveniently resistant to top-k snippets.
INRAExplorer is a reminder that in serious knowledge work, the answer is rarely sitting in one paragraph. It is usually hiding across a path.
Cognaptus: Automate the Present, Incubate the Future.
-
Jean Lelong, Adnane Errazine, and Annabelle Blangero, “Agentic RAG with Knowledge Graphs for Complex Multi-Hop Reasoning in Real-World Applications,” arXiv:2507.16507, 2025. https://arxiv.org/abs/2507.16507 ↩︎