Definition Alignment

TL;DR for operators LLM annotation is not governed by the prompt as cleanly as procurement decks would prefer. The paper behind this article shows that models bring their own internal concept boundary to definition-driven classification tasks, and that boundary can dominate the user’s intended definition even when the prompt looks explicit.1 The practical result is simple: before using an LLM as an annotator, judge, moderator, reviewer, triage engine, or rubric scorer, test whether its internal understanding of the label matches your operational definition. The paper introduces Definition-Specific Familiarity (DSF) as a lightweight proxy for that fit. DSF is positively associated with model accuracy after controlling for dataset difficulty, while three text memorization metrics are not. ...