Instruction Following

Failure logs are usually treated as evidence for the prosecution. A model is asked to produce a concise compliance summary with three bullet points, mention two risks, avoid prohibited claims, and end with a recommendation. It produces three bullets, correctly identifies the risks, avoids the prohibited claims—and forgets the recommendation. Under a strict binary reward, the response receives a zero. Under a partial-credit reward, it might receive 0.75. The first signal says nothing useful happened. The second says something useful happened, but not precisely what. ...

Instruction Following

Skill Issue or System Design? How LLMs Actually Follow Instructions

Replay the Losses, Win the Game: When Failed Instructions Become Your Best Training Data