RoboSafe: When Robots Need a Conscience (That Actually Runs)
A robot does not need evil intent to become dangerous. It only needs a bad next action. “Turn on the microwave” sounds ordinary until the microwave contains a fork. “Pick up the knife” may be harmless in a cooking task until the next move is to swing it around. “Turn on the stove” may be safe for one step and unsafe three steps later if the agent forgets to turn it off. Physical risk is annoyingly literal that way. It does not wait for a model to finish reflecting on its values. ...