Skill Issue or System Design? How LLMs Actually Follow Instructions
The checklist problem that exposes the model Checklist tasks look boring. That is exactly why they are useful. Ask an LLM to write a formal email under 50 words, include one required term, avoid another term, and return the result as JSON. None of this sounds intellectually difficult. No theorem proving. No multimodal reasoning. No dramatic benchmark leaderboard screenshot. Just instructions. ...