Crowds, Codes, and Consensus: When AI Learns the Language of Science
A lab has data. Lots of data. Spectra, simulations, microscopy images, code outputs, experimental notes, model prompts, maybe three versions of a spreadsheet called final_final_revised.xlsx, because civilization remains fragile. Then someone asks a simple question: what does this variable mean? That is when the machinery slows down. The word looked obvious when one team wrote it. It becomes less obvious when another team tries to reuse it. It becomes actively annoying when a model retrieves the wrong dataset because two groups used the same term differently, or different terms for the same concept. At that point, metadata stops being administrative wallpaper and becomes infrastructure. ...