Scientific visualization has long been caught in a bind: the more complex the dataset, the more domain-specific the visualization, and the harder it is to automate. From MRI scans to hurricane simulations, modern scientific data is massive, high-dimensional, and notoriously messy. While dashboards and 2D plots have benefitted from LLM-driven automation, 3D volumetric visualization—especially in high-performance computing (HPC) settings—has remained stubbornly manual.

VizGenie changes that.

Developed at Los Alamos National Laboratory, VizGenie is a hybrid agentic system that doesn’t just automate visualization tasks—it refines itself through them. It blends traditional visualization tools (like VTK) with dynamically generated Python modules and augments this with vision-language models fine-tuned on domain-specific images. The result: a system that can answer questions like “highlight the tissue boundaries” and actually improve its answers over time.

The Core Loop: Visualize, Validate, Improve

At the heart of VizGenie is a self-improving visualization pipeline:

  1. User Prompt: A natural language command (e.g., “show the impact plume”)
  2. Agentic Decomposition: LLM agent selects either a pre-built tool or dynamically generates new code
  3. Code Execution and Validation: The generated VTK script is tested; errors are iteratively refined
  4. Feature-aware Captioning: The output image is passed through a fine-tuned vision model to generate a semantic caption
  5. Caching and Self-Refinement: Useful code and feature mappings are stored for future reuse; new images are added to expand the model’s understanding

It’s a clever hybrid: stable for well-trodden tasks, flexible for novel ones, and most importantly—always getting smarter.

Not Just Rendering, But Reasoning

Where most systems stop at image generation, VizGenie goes further: it can talk about what it sees.

Using a fine-tuned LLaMA-Vision model (trained via LoRA on domain-specific datasets like asteroid simulations and CT scans), the system can respond to follow-up questions such as:

  • “What structures are visible in this slice?”
  • “At what scalar values do the modes occur in the histogram?”
  • “What is the optimal isovalue to highlight the skull?”

This visual question-answering (VQA) capability isn’t static. The system generates hundreds of images across varying angles and parameters to continually fine-tune its understanding. For example, expanding the visual dataset from 150 to 300 images improved caption stability from 0.43 to 0.65 in semantic similarity—a significant gain for scientific communication.

Visual Dataset Size Vocabulary Richness Caption Stability (mean)
150 images 1,606 words 0.585
300 images 2,946 words 0.717

RAG, ReAct, and the Right Model for the Job

VizGenie isn’t just one model; it’s an ensemble of specialized agents:

  • GPT-4o handles general planning and user interaction
  • o3-mini generates creative VTK code with higher temperature
  • GPT-4o-mini performs fast, deterministic code edits

And when ambiguity creeps in, VizGenie uses Retrieval-Augmented Generation (RAG) to consult past datasets and documents (e.g., contest PDFs, README files) before guessing. This drastically improves both reproducibility and scientific accuracy.

In one experiment, RAG helped the agent correctly list all vector and scalar fields in an ionization dataset—where the plain LLM got it wrong.

From Skull to Splash: A Multiverse of Datasets

VizGenie was tested on:

  • CT Scan data (e.g., skull cross-sections)
  • Hurricane Isabel pressure fields
  • Astrophysical turbulence simulations
  • Asteroid-water impact ensembles

Each use case demonstrated not just visualization quality but iterative refinement:

  • Colormap changes? Done.
  • Opacity remapping? Incrementally edited.
  • Ambiguous features? Clarified via VQA.

Perhaps the most dramatic example was when an out-of-the-box vision model described a water plume as a “cartoon gorilla in motion.” After fine-tuning, it correctly identified the event as “impact-generated water splash with rising plume.”

Limitations and Future Trajectory

VizGenie’s greatest strength—its ensemble architecture—is also its Achilles heel. Multiple LLM calls introduce latency. Dynamic code generation, though powerful, still fails occasionally. And while its vision models have improved, truly mastering scientific jargon remains a challenge.

Yet the roadmap is compelling:

  • Automate fine-tuning via self-supervised loops
  • Add support for non-VTK formats (e.g., integrate with ParaView or VisIt)
  • Expand RAG context with vector database embeddings
  • Employ diffusion models for novel visualization synthesis

Why It Matters

VizGenie isn’t just making visualization easier—it’s redefining it as a learning process.

It treats every visualization as an opportunity to expand its vocabulary, improve its reasoning, and grow its toolkit. In doing so, it closes the gap between domain experts who know what to ask and engineers who know how to make it visual. VizGenie is both.

For anyone working at the frontier of simulation, physics, climate modeling, or biomedicine, VizGenie offers a glimpse of what intelligent tooling looks like: collaborative, iterative, and always improving.


Cognaptus: Automate the Present, Incubate the Future.