Integrated Gradients

Fine-tuning is the hammer; steering is the scalpel. In an era where models are increasingly opaque and high-stakes, we need tools that guide behavior without overhauling the entire architecture. That’s precisely what GRAINS (Gradient-based Attribution for Inference-Time Steering) delivers: a powerful, interpretable, and modular way to shift the behavior of LLMs and VLMs by leveraging the most fundamental unit of influence—the token. The Problem with Global Steering Traditional inference-time steering approaches often rely on global intervention vectors: a blunt, one-size-fits-all shift in hidden activations derived from paired desirable and undesirable examples. But these methods are insensitive to which specific tokens caused bad behavior. It’s like adjusting a recipe because the dish tastes bad—without checking if the salt or the sugar was at fault. ...