Language-Fidelity

TL;DR for operators AdaMame is a paper about a very practical failure: a model can answer a user in one language while doing its reasoning in another. That is not just inelegant. It is a product, trust, and governance problem wearing a linguistics hat.1 The paper’s useful move is to stop treating multilingual reasoning as a translation issue. The authors train for language fidelity directly. First, they supervised fine-tune models on 30,000 naturally occurring reasoning traces across five languages. Then they run reinforcement learning with AdaMame-GRPO, a GRPO variant that gives extra reward when a correct rollout reasons in the query language. The extra reward grows during training, so the model first explores useful reasoning languages and later converges toward the user’s language. ...