Opening — Why this matters now
Foundation models are no longer confined to text. They’ve begun crawling out of the linguistic sandbox and into the physical world—literally. As cities digitize and mobility data proliferates, a new question surfaces: Can we build a GPT-style foundation model that actually understands movement?
A recent tutorial paper from SIGSPATIAL ’25 attempts exactly that, showing how to assemble a trajectory-focused foundation model from scratch using a simplified GPT-2 backbone. It’s refreshingly honest, decidedly hands-on, and quietly important: trajectory models are the next frontier for location‑aware services, logistics, smart cities, and any business that relies on forecasting movement.
If LLMs taught machines how to write, trajectory models may teach them how to navigate.
Background — Context and prior art
Before this line of research, mobility modeling largely relied on narrower techniques: similarity metrics, handcrafted pattern matching, or bespoke trajectory prediction architectures【fileciteturn0file0】. Useful, yes—scalable, no.
Then came the GPT wave. Researchers saw its generality and asked, “What if we steal that architecture?” TrajFM, TrajGPT, and TimesFM represent early—but promising—attempts to port transformer thinking into mobility modeling.
But until now, nobody had bothered to document how one actually adapts a language model to something that isn’t language.
That is the gap this tutorial fills.
Analysis — What the paper actually does
At the heart of the paper is an educational walkthrough: modify GPT‑2 so it can digest and predict sequences of (latitude, longitude, time). This requires more than duct tape.
1. Tokenization is dead
You can’t tokenize GPS points the way you tokenize English. Coordinates are continuous, noisy, and unbounded. So the authors drop tokenization entirely and replace it with:
- Normalized coordinates
- Time decomposed into features (hour, minute, weekday, etc.)
- A projection layer that maps these values into embedding space
Think of it as teaching GPT‑2 to read numbers instead of words.
2. LLM positional encoding isn’t enough
Trajectory data has its own rhythm. The tutorial replaces GPT‑2’s positional encoding with the original Transformer sinusoidal encoding—later comparing this to rotary position embeddings (RoPE), which state‑of‑the‑art models like TrajFM use for long‑range spatial correlation.
3. Simplified architecture for mobility scale
Most mobility datasets are tiny compared to text corpora. So the tutorial trims GPT‑2 down to two transformer blocks and uses streaming dataloaders to survive memory constraints.
4. Delta encoding for stable prediction
Instead of predicting absolute positions (which would be chaotic), the model predicts step‑to‑step changes. This improves stability and mirrors how autoregressive prediction naturally works.
5. Masking beyond next-step prediction
To support imputation—filling in missing trajectory segments—the authors introduce masked modeling. It’s the mobility equivalent of BERT’s “mask 15% of tokens” strategy.
The result: a fully functional, pedagogical trajectory foundation model that mirrors the architectural spirit of more sophisticated models like TrajFM and TrajGPT.
Findings — The evolving design space
To position their prototype within the literature, the authors compare three major model families: TrajFM, TrajGPT, and TimesFM.
Here’s a compact comparison:
| Model | Input Representation | Positional Encoding | Output Type | Distinctive Feature |
|---|---|---|---|---|
| Tutorial Model | Continuous coords + time features | Sinusoidal | Δ-coordinate predictions | Minimal, educational GPT‑2 adaptation |
| TrajFM | Coordinates + POI semantics + temporal Fourier features | RoPE | Continuous coords | Advanced masking + POI embeddings |
| TrajGPT | Space2Vec + Time2Vec + region embeddings | Standard Transformer | Region + time + GMM duration | Multi‑head prediction with categorical outputs |
| TimesFM | Time‑series patches | Sinusoidal | Future sequence patches | Patching for extreme sequence compression |
Each model represents a different bet on how to “translate” movement into something a transformer can reason about.
Implications — Why businesses should care
Trajectory foundation models are not academic curiosities. They are the skeleton key for:
- Urban mobility analytics — Predicting flows, congestion, and demand.
- Logistics optimization — Smarter routing, faster delivery windows.
- Autonomous systems — Drones, AVs, and robots that understand real-world movement patterns.
- Telecom & retail — Foot‑traffic forecasting for store placement and network planning.
- Risk & insurance — Modeling driver behavior at scale.
The core value proposition: when you build a model that generalizes across regions, tasks, and datasets, you shrink cost while expanding applicability. That’s exactly what trajectory foundation models aim for.
And as with every foundation-model wave, early movers gain the strongest moat.
Conclusion — From waypoints to world models
This tutorial may be modest in length, but its implications are not. GPT‑style models are drifting into the physical world, and mobility is one of the most promising domains for real commercial impact. The authors’ step-by-step reconstruction of a trajectory-focused foundation model demystifies an area that has long been opaque.
If “LLM for movement” sounds esoteric today, give it a few years. Movement is data—and data becomes models.
Cognaptus: Automate the Present, Incubate the Future.