GitHub Issues

TL;DR for operators Coding assistants look much better when the task is a clean question than when the task is a messy software support conversation. That is the inconvenient point of CodeAssistBench, or CAB, a benchmark that turns resolved GitHub issues into multi-turn, project-grounded conversations where a model must behave like a maintainer, not a code-snippet vending machine.1 ...