Parallel Worlds of Moderation: Simulating Online Civility with LLMs
Moderation teams live inside an annoying counterfactual. A user posts something toxic. The platform sends a warning, hides the post, suspends the account, or does nothing. A week later, the team can measure what happened. What it cannot observe is the parallel platform where the same user, same thread, same sequence of replies, and same ambient mood unfolded without that intervention. ...