Cover image

The Conscience Plug-in: Teaching AI Right from Wrong on Demand

🧠 From Freud to Fine-Tuning: What is a Superego for AI? As AI agents gain the ability to plan, act, and adapt in open-ended environments, ensuring they behave in accordance with human expectations becomes an urgent challenge. Traditional approaches like Reinforcement Learning from Human Feedback (RLHF) or static safety filters offer partial solutions, but they falter in complex, multi-jurisdictional, or evolving ethical contexts. Enter the idea of a Superego layer—not a psychoanalytical metaphor, but a modular, programmable conscience that governs AI behavior. Proposed by Nell Watson et al., this approach frames moral reasoning and legal compliance not as traits baked into the LLM itself, but as a runtime overlay—a supervisory mechanism that monitors, evaluates, and modulates outputs according to a predefined value system. ...

June 18, 2025 Â· 4 min Â· Zelina
Cover image

Scaling Trust, Not Just Models: Why AI Safety Must Be Quantitative

As artificial intelligence surges toward superhuman capabilities, one truth becomes unavoidable: the strength of our oversight must grow just as fast as the intelligence of the systems we deploy. Simply hoping that “better AI will supervise even better AI” is not a strategy — it’s wishful thinking. Recent research from MIT and collaborators proposes a bold new way to think about this challenge: Nested Scalable Oversight (NSO) — a method to recursively layer weaker systems to oversee stronger ones1. One of the key contributors, Max Tegmark, is a physicist and cosmologist at MIT renowned for his work on AI safety, the mathematical structure of reality, and existential risk analysis. Tegmark is also the founder of the Future of Life Institute, an organization dedicated to mitigating risks from transformative technologies. ...

April 29, 2025 Â· 6 min