SceneMaker: When 3D Scene Generation Stops Guessing
A chair behind a table is not half a chair A single image can be a very rude input. It shows the front of a room, hides the back of objects, compresses depth into pixels, and then asks a model to produce a coherent 3D scene. The model must decide what the hidden side of a chair looks like, how large the chair is, whether it sits behind the table or intersects with it, and where everything belongs in 3D space. Naturally, when the result looks wrong, we often blame “weak 3D generation.” ...