When Text Doesn’t Help: Rethinking Multimodality in Forecasting

The Multimodal Mirage In recent years, there’s been growing enthusiasm around combining unstructured text with time series data. The promise? Textual context—say, clinical notes, weather reports, or market news—might inject rich insights into otherwise pattern-driven numerical streams. With powerful vision-language and text-generation models dominating headlines, it’s only natural to wonder: Could Large Language Models (LLMs) revolutionize time series forecasting too? A new paper from AWS researchers provides the first large-scale empirical answer. The verdict? The benefits of multimodality are far from guaranteed. In fact, across 14 datasets spanning domains from agriculture to healthcare, incorporating text often fails to outperform well-tuned unimodal baselines. Multimodal forecasting, it turns out, is more of a conditional advantage than a universal one. ...

June 30, 2025 · 3 min · Zelina