Semantic Robustness

TL;DR for operators AI systems do not merely fail by giving the wrong answer. They also fail by changing the kind of action they take when the meaning has not changed, or by spreading an update into places where it was never supposed to go. That is the shared lesson from two recent papers that, at first glance, live in different neighborhoods. One studies code-mixed hate moderation and shows that clean-English-tuned workflows can route the same underlying content differently when it appears as Tamil-English code-mix.1 The other studies multimodal knowledge editing and proposes a method for updating model knowledge so corrections generalize to related queries without disturbing visually or semantically nearby but unrelated facts.2 ...