Vision-Language Models

Scientific visualization has long been caught in a bind: the more complex the dataset, the more domain-specific the visualization, and the harder it is to automate. From MRI scans to hurricane simulations, modern scientific data is massive, high-dimensional, and notoriously messy. While dashboards and 2D plots have benefitted from LLM-driven automation, 3D volumetric visualization—especially in high-performance computing (HPC) settings—has remained stubbornly manual. VizGenie changes that. Developed at Los Alamos National Laboratory, VizGenie is a hybrid agentic system that doesn’t just automate visualization tasks—it refines itself through them. It blends traditional visualization tools (like VTK) with dynamically generated Python modules and augments this with vision-language models fine-tuned on domain-specific images. The result: a system that can answer questions like “highlight the tissue boundaries” and actually improve its answers over time. ...

When it comes to prompting vision-language models, most methods rely on textual descriptions extracted from large language models like GPT. But those descriptions—“fluffy fur, friendly eyes, golden color”—are often verbose, ambiguous, or flat-out unreliable. What if we could skip that noisy middle step entirely? That’s the premise behind DeMul (Description-free Multi-prompt Learning), a new method presented at ICLR 2025 that quietly delivers a major leap in few-shot image classification. Instead of generating descriptions for each class, DeMul directly distills the semantic knowledge of GPT embeddings into learnable prompt vectors. The result is simpler, more robust, and strikingly effective. ...

Vision-Language Models

Seeing is Retraining: How VizGenie Turns Visualization into a Self-Improving AI Loop

Prompt Without Words: Distilling GPT Semantics for Smarter Vision Models