
Prompt Without Words: Distilling GPT Semantics for Smarter Vision Models
When it comes to prompting vision-language models, most methods rely on textual descriptions extracted from large language models like GPT. But those descriptions—“fluffy fur, friendly eyes, golden color”—are often verbose, ambiguous, or flat-out unreliable. What if we could skip that noisy middle step entirely? That’s the premise behind DeMul (Description-free Multi-prompt Learning), a new method presented at ICLR 2025 that quietly delivers a major leap in few-shot image classification. Instead of generating descriptions for each class, DeMul directly distills the semantic knowledge of GPT embeddings into learnable prompt vectors. The result is simpler, more robust, and strikingly effective. ...