Cover image

Anchors Aweigh? Why Small LLMs Refuse to Flip Their Own Semantics

A label looks harmless until you ask it to lie. Tell a model that a glowing movie review should be labeled POS, and few-shot prompting behaves like a useful intern: it studies the examples, picks up the pattern, and usually gets better. Tell the same model that a glowing review should now be labeled NEG, and the intern becomes less useful. It does not smoothly learn your private code. It does not politely invert its semantic universe. It mostly produces a muddle. ...

November 30, 2025 · 15 min · Zelina