RL Grows a Third Dimension: Why Text-to-3D Finally Needs Reasoning
A chair is not a picture of a chair. That sounds obvious until a text-to-3D system forgets the backrest from one angle, gives the chair three legs from another, paints the seat correctly, and somehow convinces a weak evaluator that the job is mostly done. In 2D generation, a model can often survive by producing a plausible view. In 3D generation, every view is a witness. Geometry, texture, object parts, and spatial relationships all have to agree. Annoying, yes. Also the entire point. ...