Shaking the Stack: Teaching Seismology to Talk Back

Simulation software has a talent for hiding intelligence inside inconvenience.

A mature physics code may contain decades of numerical insight, community testing, and domain expertise. Then it asks the user to prove loyalty by editing parameter files, remembering command sequences, managing mesh directories, choosing execution binaries, checking output folders, and pretending that none of this is a productivity tax. This is not because scientists enjoy suffering. Mostly. It is because high-performance scientific software often grows around capability first and usability later.

The paper behind today’s article, Seismology Modeling Agent: A Smart Assistant for Geophysical Researchers, is about that gap.¹ It introduces a Model Context Protocol, or MCP, server suite for SPECFEM, the widely used open-source seismic wave simulation family. The authors implement separate MCP servers for SPECFEM2D, SPECFEM3D_Cartesian, and SPECFEM3D_GLOBE, then show how an LLM agent can use those tools to generate configuration files, run meshers and solvers, and produce visualizations from natural-language instructions.

The tempting headline is “AI does seismology.” That is also the sloppy headline. The paper does not show an autonomous geophysicist discovering new Earth structure. It shows something more immediately useful: an agent-facing service layer that converts a file-driven scientific workflow into a structured, callable, inspectable set of operations.

That difference matters. One is science fiction with prettier diagrams. The other is a practical modernization strategy for organizations sitting on powerful legacy tools that only a few specialists know how to operate without injury.

The real contribution is the control layer, not the chat box

SPECFEM is not a toy program waiting for a chatbot costume. It is a family of spectral-element simulation tools used for seismic wave propagation across local, regional, and global scales. The paper reviews the usual strengths: geometric flexibility, high numerical accuracy, support for complex topography and material heterogeneity, and compatibility with parallel computation. It also reviews the operational burden: users must configure many text files, coordinate meshing and solver executables, manage MPI or GPU settings, and post-process large output datasets.

That is the key business and engineering problem. The bottleneck is not only “users do not understand seismology.” The bottleneck is that a valuable computational asset is exposed through a workflow that demands too much low-level procedural memory.

A graphical user interface can hide some of this. But the authors correctly point out that a GUI mainly changes the interaction surface. It can turn command lines into buttons and text files into forms, but it does not necessarily understand the user’s scientific intent. A researcher still has to translate “simulate wave propagation through a basin with complex topography” into the right files, parameter dependencies, source definitions, station layouts, mesh choices, boundary settings, and execution sequence.

The MCP layer changes the abstraction. It exposes SPECFEM operations as tools that an LLM agent can discover and call. The agent does not magically understand wave physics because it is charming in a sidebar. It gains operational reach because the underlying software is decomposed into machine-readable actions.

The paper’s architecture has three main layers:

Layer	What it does	Why it matters
SPECFEM core software	Performs the actual seismic wave simulations using existing compiled SPECFEM executables	The numerical engine remains the trusted scientific code, not the LLM
SPECFEM MCP servers	Translate structured tool calls into file generation, process execution, MPI/GPU settings, and output handling	This is the main modernization layer
LLM agent interface	Interprets user intent, plans tool calls, invokes MCP tools, and reports results	Natural language becomes a control interface for an existing simulation stack

This structure is more conservative than the AI branding may suggest. The authors do not rewrite SPECFEM. They wrap it. The MCP servers prepare files in expected directories, invoke compiled executables, and read outputs for visualization. That is precisely why the work is practically interesting. Modernization does not always mean replacing the trusted machine. Sometimes it means building a better steering wheel.

MCP turns “what I want” into “what the software can execute”

The paper’s SPECFEM MCP servers expose toolsets for each member of the SPECFEM family. For SPECFEM2D, the tools include generating the main parameter file, source file, stations file, interface file, running the mesher, running the solver, and visualizing results. For SPECFEM3D_Cartesian, the suite expands to include moment tensor and force source files, mesh generation, mesh decomposition for parallel processing, database generation, solver execution, and visualization. SPECFEM3D_GLOBE receives its own global-simulation version of this tool structure.

This is the mechanism worth slowing down for. The LLM does not directly “operate SPECFEM” in some vague anthropomorphic sense. It interacts with tools that have names, schemas, inputs, outputs, and execution logic. Each tool module exposes a definition and a JSON schema to the agent, then implements a handler that performs the corresponding operation. Some handlers use templates to generate configuration files. Others wrap SPECFEM executables through subprocess calls while managing paths and parallelization.

So the agentic workflow has a specific shape:

The user gives a high-level scientific instruction.
The agent identifies intent and discovers available MCP tools.
It plans a sequence of tool calls.
It sends structured requests to the appropriate SPECFEM MCP server.
The server generates files, runs binaries, or visualizes outputs.
The agent summarizes the execution and returns intermediate or final results to the user.
The user can intervene, confirm, refine, or correct the workflow.

This is not merely “natural language in, simulation out.” It is a controlled translation pipeline from intent to executable scientific workflow. That pipeline is also where most enterprise relevance sits. Companies do not need every LLM system to become a genius. They need useful software to become reachable by more people without destroying governance, reproducibility, or domain control.

The five cases are stress tests of the wrapper, not five independent discoveries

The paper validates the workflow through five case studies. They range from a teaching-level 2D seismic lens experiment to a global SPECFEM3D_GLOBE simulation of the 2011 Tohoku earthquake scenario. Read casually, the section can feel like a tour of seismology demos. Read mechanically, it is an escalation test of the MCP interface.

Each case probes a different operational capability.

Case	Likely purpose in the paper	What it supports	What it does not prove
1. Teaching-level 2D seismic lens effect	Main evidence for intent-to-workflow execution in a simple controlled setting	The agent can build paired models, run SPECFEM2D, and produce wavefield/seismogram evidence consistent with the requested comparison	It does not prove autonomous scientific reasoning beyond the configured task
2. Complex 2D exploration-style model	Main evidence for handling richer geometry, stratigraphy, attenuation, and receiver layouts	The agent can translate a detailed high-level description into a multi-layer model and acquisition setup with light user guidance	It does not quantify setup-time savings or error reduction
3. User-provided salt dome mesh	Robustness-style workflow variation	The agent can start from external mesh files, infer geometry constraints, assign materials, and complete the missing simulation setup	It does not prove arbitrary mesh understanding across all external formats
4. Campi Flegrei 3D regional simulation	Scale and realism extension	The agent can parse a dense technical specification and configure SPECFEM3D_Cartesian for a real-world volcanic setting	It does not show autonomous parameter selection; the user supplies rigorous specifications
5. Global SPECFEM3D_GLOBE simulation	Highest-scale demonstration	The agent can configure a planetary-scale run with full physical complexities and GPU execution from a comprehensive instruction	It does not benchmark scientific accuracy against observed earthquake data

This table is important because the evidence is case-based, not benchmark-based. The authors show successful execution across increasingly complex scenarios. They do not provide a controlled productivity experiment comparing expert users, novice users, GUI workflows, scripts, and MCP agents. They also do not provide an ablation isolating which part of the agent stack contributes most: tool schemas, templating, the Cline interface, model choice, or human-in-the-loop correction.

That is not a fatal flaw. It simply tells us how to interpret the paper. The work is a systems demonstration, not a statistical productivity study.

Case 1 shows the agent can build a clean pedagogical comparison

The first case asks the agent to design and execute a comparative SPECFEM2D experiment demonstrating the seismic lens effect. The user asks for two models: one with flat layering and one with a buried anticline or ridge, while holding material properties and acquisition layout constant. The purpose is to isolate the effect of geometry.

This is a sensible first test because it checks the basic intent-to-execution loop. The agent must generate the relevant SPECFEM2D files, run the mesher and solver, and organize visualizations. The paper reports that the resulting wavefield snapshots show a forward-bulging wavefront above the high-velocity ridge, while surface seismograms show earlier arrivals in the central receiver region for the buried-ridge model.

The scientific interpretation is not the shocking part. A high-velocity structure accelerating wave propagation is not exactly gossip from Olympus. The operational significance is that the system creates a controlled comparative experiment from a high-level instruction and produces outputs that align with the intended teaching objective.

For business readers, this maps neatly to training and onboarding. Many technical organizations have internal tools whose first use case is not replacing senior experts, but compressing the distance between “I understand the concept” and “I can run the software without breaking six hidden assumptions.” A teaching-level simulation assistant is not glamorous. It is merely useful. A tragedy, apparently.

Case 2 tests whether richer intent survives contact with geometry

The second case moves from a clean teaching example to a more complex 2D exploration-style model. The user asks for undulating stratigraphic layers, rugged surface topography, depth-increasing elastic and attenuation parameters, a localized reservoir anomaly, and a mixed receiver system including a dense surface line and two deviated VSP arrays.

Here the challenge is dependency management. Geometry affects mesh construction. Material assignments affect configuration files. Receiver placement must avoid inappropriate intersections. Boundary settings and GPU execution must be configured consistently. The paper reports that the agent generates the required model, assigns the requested properties, runs the simulation across eight GPUs, and produces wavefield snapshots and multicomponent seismograms. The two VSP arrays show different responses depending on whether they approach or move away from the anomaly.

This case supports a stronger claim than Case 1: the MCP-agent workflow can coordinate several interdependent modeling choices under human refinement. The phrase “light user guidance” matters. The user still adjusts interface geometry and station distributions. That is not a weakness. For scientific computing, it may be the correct division of labor.

The assistant handles procedural assembly. The researcher keeps control over scientific intent and critical design choices. That is the grown-up version of human-in-the-loop AI: not a decorative approval button at the end, but intervention points where domain judgment actually matters.

Case 3 is the most business-relevant test: partial workflows are normal workflows

The third case may be the most transferable beyond seismology. Instead of asking the agent to build everything from scratch, the user provides an external CUBIT mesh for a salt dome model. The agent must inspect the mesh files, understand topology and material indices, map physical properties to those indices, configure fluid-solid coupling and absorbing boundaries, then complete the forward simulation.

This reflects real enterprise workflow better than the clean “start from zero” demo. In practice, companies rarely have a blank slate. They have partial files, legacy projects, existing models, inconsistent naming habits, and a folder called final_final_really_final_v3. The useful assistant is not one that only performs in a newly paved sandbox. It must attach itself to work already in progress.

The paper reports that the agent successfully completes the simulation using the external mesh. The resulting seismograms show a zero horizontal component at the sea-surface array in the acoustic fluid layer, while seabed and subsurface arrays record significant energy on both components. The authors interpret this as evidence that acoustic-elastic coupling was implemented correctly.

This is a stronger physical consistency check than simply saying “the solver ran.” A simulation that finishes can still be wrong in deeply professional ways. Here, the output contains a recognizable physical signature tied to the configuration. Still, the boundary should be kept clear: the case shows success for this mesh and setup, not universal robust parsing of all mesh formats or all industrial model archives.

For business practice, this is the pattern to copy: use agents to complete, validate, and document partial workflows. The ROI is not only “fewer clicks.” It is fewer dropped handoffs between specialists.

Cases 4 and 5 show scale, but scale is not the same as autonomy

The fourth case uses SPECFEM3D_Cartesian for a 3D forward simulation of the Campi Flegrei volcanic region in Italy. The user provides a dense technical specification, including the projection system, topography, attenuation model, C-PML boundaries, timestep, duration, source, station network, and required configuration files. The agent’s job is not to invent the experiment. It must parse and execute the specification.

That distinction is editorially important. In this case, the agent is closer to a disciplined technical operator than a scientific designer. It converts a dense experimental design into an executable simulation environment, generates files, runs the simulation, reads output directories, and visualizes velocity seismograms. The paper reports successful completion and station responses differing by position relative to the shallow volcanic source.

The fifth case extends the same idea to SPECFEM3D_GLOBE. The user specifies a global simulation using a single NVIDIA A800 GPU, the s362ani 3D mantle model, crustal/topographic datasets available locally, and the 2011 Tohoku earthquake scenario. The agent retrieves earthquake parameters, helps refine configuration through dialogue, generates a GPU-enabled Par_file, creates CMTSOLUTION and STATIONS files, runs meshing and forward simulation, scans outputs, and produces station signal plots.

This is impressive as workflow coverage. It also creates a risk of overreading. A global simulation that executes successfully is evidence that the interface can reach a large and complex SPECFEM workflow. It is not proof that the agent can decide which Earth model is scientifically appropriate, validate observations against real station data, or choose an inversion strategy. The authors’ outlook gestures toward autonomous geophysical laboratories. The actual evidence remains a forward-simulation assistant with structured tool access and human supervision.

That is still a meaningful contribution. Not every useful system has to arrive wearing the lab coat of full autonomy.

The hidden business lesson: wrap the asset before replacing the asset

The business relevance of this paper reaches beyond geophysics. Many organizations own or depend on mature software that is powerful, trusted, and painful. Engineering simulation suites, compliance engines, actuarial models, risk systems, logistics optimizers, geospatial pipelines, internal data tools: the list is long and mostly allergic to elegant UX.

The reflexive AI product move is to build a chatbot in front of the system. The better move is what this paper demonstrates: build a structured service layer that decomposes the real workflow into callable tools.

A chatbot alone gives the user conversation. A tool layer gives the agent agency under constraints.

Technical move	Operational consequence	ROI relevance
Expose legacy operations as MCP tools	The agent can discover available actions and call them with structured inputs	Reduces dependence on hidden procedural knowledge
Use templates for configuration generation	Repeated file creation becomes standardized	Lowers configuration error risk
Preserve the original solver	Scientific or business logic remains in the trusted core engine	Avoids expensive and risky replacement
Allow human-in-the-loop refinement	Experts can intervene at design-critical moments	Keeps domain judgment where stakes are high
Return paths, outputs, and visualizations	Results become easier to inspect and communicate	Improves handoff and reproducibility

The inference for business use is straightforward but bounded. If a company has an existing technical workflow that is valuable but hard to operate, an agent interface can create value without replacing the underlying system. The path is not “LLM replaces expert.” The path is “expert intent travels through a safer, more structured execution channel.”

That matters most when the organization has three conditions:

First, the underlying tool is already trusted. SPECFEM is worth wrapping because its numerical core is mature. Wrapping a weak tool merely automates mediocrity. A thrilling genre, but crowded.

Second, the workflow has repeatable procedural steps. File generation, command execution, output scanning, visualization, and validation checks are good candidates for toolization.

Third, the cost of user error is high enough to justify formalization. In scientific computing, a wrong parameter can waste compute time or invalidate results. In business workflows, the equivalent may be a wrong regulatory form, wrong pricing scenario, wrong risk assumption, or wrong deployment flag.

What the paper directly shows, and what Cognaptus infers

The paper directly shows that MCP servers can be built for SPECFEM2D, SPECFEM3D_Cartesian, and SPECFEM3D_GLOBE; that an LLM agent can use those servers to orchestrate complete forward-simulation workflows; and that the resulting case studies produce outputs consistent with their requested setups. It also shows that the workflow can operate with both autonomous execution and multi-turn human guidance.

Cognaptus infers a broader modernization pattern: MCP-like interfaces can turn mature technical software into agent-controllable infrastructure. The strongest business use is not full replacement of expert labor, but reduction of operational friction around expert tools.

What remains uncertain is the measurable size of that benefit. The paper does not report time saved, error-rate reduction, novice-versus-expert performance, cost per simulation, or controlled comparisons against shell scripts, notebooks, GUIs, or workflow managers. It also does not deeply evaluate safety controls, permissioning, audit logs, or failure recovery under adversarial or messy real-world conditions.

Those omissions do not erase the contribution. They define the next evaluation layer.

For business deployment, a serious version of this system would need additional controls:

Deployment question	Why it matters
Who is allowed to run which tool?	Simulation workflows may consume expensive compute or modify important project files
Which parameters require expert approval?	Some choices are operational, others are scientific or safety-critical
How are generated files versioned?	Reproducibility depends on knowing exactly what the agent changed
How are failed runs diagnosed?	Agents must not silently “fix” problems by changing assumptions
What outputs count as validation?	A pretty waveform is not automatically a correct simulation
How are model/tool calls logged?	Auditability matters when decisions affect engineering, safety, or cost

This is where the paper’s human-in-the-loop design is useful but incomplete. Human oversight is a design principle, not a governance system. The next step is to specify where oversight occurs, what must be approved, what can be automated, and what evidence must be attached to each result.

The misconception to avoid: agent execution is not scientific judgment

The paper’s strongest claim is about access and orchestration. It lowers the barrier between scientific intent and executable simulation workflow. It does not eliminate the need for geophysical judgment.

That distinction should shape how business leaders read it. The lesson is not that LLMs can now own high-stakes technical decisions. The lesson is that LLM agents become useful when expert software is exposed through structured, limited, inspectable actions.

In other words, the magic is not in the model saying “I understand earthquakes.” The magic is in the system saying, “Here are the valid operations, here are their schemas, here are the files generated, here is the solver output, and here is where the human can intervene.”

That is a much less theatrical claim. Naturally, it is also the one more likely to survive contact with production.

From seismic stacks to enterprise stacks

The title of the paper places it inside computational seismology. Fair enough. But the architecture belongs to a broader class of AI transformation projects: making old but valuable systems conversational without making them unserious.

A bank might wrap risk models. A manufacturer might wrap finite-element workflows. A logistics company might wrap routing simulations. A government agency might wrap geospatial analysis pipelines. A research lab might wrap experimental design tools. In each case, the useful question is not “Can we attach ChatGPT to this?” That question is too easy and usually leads to a demo.

The better questions are:

What are the real atomic operations in the workflow?
Which files, parameters, and execution steps must be controlled?
Which decisions can be delegated, and which must remain with experts?
What evidence should be returned after each step?
How do we make the workflow reproducible after the conversation disappears?

The SPECFEM MCP paper is valuable because it answers those questions in one demanding scientific setting. It shows that an agent interface can be built around a serious simulation stack without rewriting the stack or pretending the LLM is the solver.

That is the pattern worth stealing.

Conclusion: the assistant is useful because the stack learned to answer back

The paper’s contribution is not that seismology suddenly became conversational. The contribution is that the SPECFEM workflow was decomposed into structured tools that an agent can invoke, while the trusted numerical engine remains intact.

The five case studies matter because they escalate the same mechanism: simple 2D teaching experiment, richer 2D exploration model, external mesh integration, 3D regional volcanic simulation, and global earthquake-scale forward modeling. Together, they support the claim that MCP can act as a modernization layer for legacy scientific software.

For businesses, the message is precise. Do not start by asking whether an LLM can replace your technical experts. Start by asking which expert tools are trapped behind brittle workflows, hidden file conventions, and manual execution chains. Then wrap those tools so intent can move through the system cleanly, with validation and human control where they belong.

The stack does not need to become sentient. It only needs to talk back in a structured way.

Cognaptus: Automate the Present, Incubate the Future.

Yukun Ren et al., “Seismology modeling agent: A smart assistant for geophysical researchers,” arXiv:2512.14429, 2025, https://arxiv.org/abs/2512.14429. ↩︎

The real contribution is the control layer, not the chat box#

MCP turns “what I want” into “what the software can execute”#

The five cases are stress tests of the wrapper, not five independent discoveries#

Case 1 shows the agent can build a clean pedagogical comparison#

Case 2 tests whether richer intent survives contact with geometry#

Case 3 is the most business-relevant test: partial workflows are normal workflows#

Cases 4 and 5 show scale, but scale is not the same as autonomy#

The hidden business lesson: wrap the asset before replacing the asset#

What the paper directly shows, and what Cognaptus infers#

The misconception to avoid: agent execution is not scientific judgment#

From seismic stacks to enterprise stacks#

Conclusion: the assistant is useful because the stack learned to answer back#