Opening — Why this matters now
Railway delays are one of those problems everyone experiences and almost no one truly understands. Passengers blame weather. Operators blame operations. Data scientists blame missing variables. Everyone is partially correct.
What has quietly shifted in recent years is not the weather itself, but our ability to observe it alongside operations—continuously, spatially, and at scale. As rail systems push toward AI‑assisted scheduling, predictive maintenance, and real‑time disruption management, delay prediction without weather is no longer just incomplete—it is structurally misleading.
This paper arrives at the right moment: not with another model, but with something more foundational—a properly engineered dataset that finally lets weather and rail operations talk to each other.
Background — Context and prior art
Railway analytics has never suffered from a lack of data. Timetables, GPS traces, rolling stock logs, and station events are widely available across Europe and Asia. What has been missing is integration.
Most public railway datasets focus on what trains did, not under what conditions they did it. Weather, despite being repeatedly cited as a major delay driver—especially in Nordic and alpine regions—has typically been left out or reduced to a handful of coarse indicators.
Previous studies that attempted weather-aware delay prediction often relied on limited variables (temperature, rainfall) or narrow geographies. Others acknowledged weather as important future work, a polite academic way of saying “we know this matters, but the data is painful.”
The Finnish context makes this omission particularly costly. With a largely single-track network and temperatures ranging from −40°C to +30°C, small disruptions propagate fast. Weather is not noise here; it is structure.
Analysis — What the paper actually does
Instead of proposing a new neural architecture, the authors do something more operationally valuable: they build infrastructure.
Two data worlds, finally aligned
The dataset fuses:
- Railway operations data from Finland’s Digitraffic system (2018–2024), covering schedules, arrivals, departures, delays, cancellations, and stop-level granularity.
- Meteorological observations from 209 Finnish Meteorological Institute stations, with up to 13 weather variables at 1‑minute or 10‑minute resolution.
The key is not collection, but alignment.
Spatial–temporal matching that respects reality
Each train stop is matched to the nearest weather station using Haversine distance, then aligned in time using nearest-neighbor matching within tolerance windows. When a weather station lacks a specific sensor (snow depth, visibility, precipitation), a 50 km spatial fallback retrieves the closest valid measurement.
This is not mathematically elegant. It is operationally sane.
Feature engineering that understands time
Rather than treating hours, weekdays, and months as linear numbers, the dataset applies cyclical sine–cosine encoding, preserving temporal continuity (midnight is close to 23:00, February is not “far” from January).
Weather variables are robust-scaled using interquartile ranges, acknowledging that meteorological data is inherently outlier-prone.
Delay is not one thing—and the dataset proves it
The most underappreciated contribution is how the paper treats delay itself.
Three targets are provided:
| Target | What it captures | Why it matters |
|---|---|---|
| Cumulative delay | End-to-end passenger experience | Useful for passenger information systems |
| Origin-offset delay | Removes initial lateness | Cleaner operational signal |
| Station-specific delay | Isolates incremental delay per segment | Critical for diagnosing causes |
Most delays, it turns out, are inherited—not created locally. Without this distinction, models learn propagation, not causality.
Findings — What the data reveals
The descriptive analysis alone justifies the dataset.
Seasonal reality, quantified
- Winter months regularly exceed 25% delay incidence.
- Summer performs better overall, but June spikes—likely due to traffic volume rather than weather.
Weekly structure
- Fridays are consistently the worst-performing day.
- Saturdays are the most reliable—an operational pattern, not a meteorological one.
Delay severity distribution
Medium delays (10–15 minutes) dominate. Extreme delays exist, but they are not the norm—important when designing loss functions and evaluation metrics.
Baseline ML result (and why it matters)
A straightforward XGBoost regression, trained on station-specific delay targets, achieves:
| Target | MAE (minutes) |
|---|---|
| Station-specific delay | 2.73 |
| Cumulative delay | 4.21 |
| Origin-offset delay | 4.81 |
The lesson is not that XGBoost is impressive. It is that the right target formulation makes prediction easier.
Implications — What this enables next
This dataset quietly unlocks several directions that were previously speculative:
-
Weather impact attribution With station-level delay increments, weather variables can finally be analyzed causally rather than correlatively.
-
Graph and sequence models Delay propagation is inherently sequential and networked. Transformers and GNNs now have the data structure they require.
-
Infrastructure vulnerability mapping Repeated weather-sensitive delay hotspots point directly to physical bottlenecks, not just operational ones.
-
Operational realism in AI systems Models that ignore weather tend to look accurate—until winter arrives. This dataset removes that illusion.
Conclusion — Less mystery, more mechanics
This work does not promise perfect delay prediction. It does something better: it removes a long-standing excuse.
By integrating operational railway data with high-resolution meteorological observations—carefully, transparently, and at national scale—the authors provide a foundation that future models can actually stand on.
In an industry where “weather disruption” is often treated as an act of God, this dataset reframes it as a measurable, modelable, and ultimately manageable variable.
Cognaptus: Automate the Present, Incubate the Future.