Opening — Why this matters now

Foundation models are no longer confined to text. They’ve begun crawling out of the linguistic sandbox and into the physical world—literally. As cities digitize and mobility data proliferates, a new question surfaces: Can we build a GPT-style foundation model that actually understands movement?

A recent tutorial paper from SIGSPATIAL ’25 attempts exactly that, showing how to assemble a trajectory-focused foundation model from scratch using a simplified GPT-2 backbone. It’s refreshingly honest, decidedly hands-on, and quietly important: trajectory models are the next frontier for location‑aware services, logistics, smart cities, and any business that relies on forecasting movement.

If LLMs taught machines how to write, trajectory models may teach them how to navigate.

Background — Context and prior art

Before this line of research, mobility modeling largely relied on narrower techniques: similarity metrics, handcrafted pattern matching, or bespoke trajectory prediction architectures【fileciteturn0file0】. Useful, yes—scalable, no.

Then came the GPT wave. Researchers saw its generality and asked, “What if we steal that architecture?” TrajFM, TrajGPT, and TimesFM represent early—but promising—attempts to port transformer thinking into mobility modeling.

But until now, nobody had bothered to document how one actually adapts a language model to something that isn’t language.

That is the gap this tutorial fills.

Analysis — What the paper actually does

At the heart of the paper is an educational walkthrough: modify GPT‑2 so it can digest and predict sequences of (latitude, longitude, time). This requires more than duct tape.

1. Tokenization is dead

You can’t tokenize GPS points the way you tokenize English. Coordinates are continuous, noisy, and unbounded. So the authors drop tokenization entirely and replace it with:

  • Normalized coordinates
  • Time decomposed into features (hour, minute, weekday, etc.)
  • A projection layer that maps these values into embedding space

Think of it as teaching GPT‑2 to read numbers instead of words.

2. LLM positional encoding isn’t enough

Trajectory data has its own rhythm. The tutorial replaces GPT‑2’s positional encoding with the original Transformer sinusoidal encoding—later comparing this to rotary position embeddings (RoPE), which state‑of‑the‑art models like TrajFM use for long‑range spatial correlation.

3. Simplified architecture for mobility scale

Most mobility datasets are tiny compared to text corpora. So the tutorial trims GPT‑2 down to two transformer blocks and uses streaming dataloaders to survive memory constraints.

4. Delta encoding for stable prediction

Instead of predicting absolute positions (which would be chaotic), the model predicts step‑to‑step changes. This improves stability and mirrors how autoregressive prediction naturally works.

5. Masking beyond next-step prediction

To support imputation—filling in missing trajectory segments—the authors introduce masked modeling. It’s the mobility equivalent of BERT’s “mask 15% of tokens” strategy.

The result: a fully functional, pedagogical trajectory foundation model that mirrors the architectural spirit of more sophisticated models like TrajFM and TrajGPT.

Findings — The evolving design space

To position their prototype within the literature, the authors compare three major model families: TrajFM, TrajGPT, and TimesFM.

Here’s a compact comparison:

Model Input Representation Positional Encoding Output Type Distinctive Feature
Tutorial Model Continuous coords + time features Sinusoidal Δ-coordinate predictions Minimal, educational GPT‑2 adaptation
TrajFM Coordinates + POI semantics + temporal Fourier features RoPE Continuous coords Advanced masking + POI embeddings
TrajGPT Space2Vec + Time2Vec + region embeddings Standard Transformer Region + time + GMM duration Multi‑head prediction with categorical outputs
TimesFM Time‑series patches Sinusoidal Future sequence patches Patching for extreme sequence compression

Each model represents a different bet on how to “translate” movement into something a transformer can reason about.

Implications — Why businesses should care

Trajectory foundation models are not academic curiosities. They are the skeleton key for:

  • Urban mobility analytics — Predicting flows, congestion, and demand.
  • Logistics optimization — Smarter routing, faster delivery windows.
  • Autonomous systems — Drones, AVs, and robots that understand real-world movement patterns.
  • Telecom & retail — Foot‑traffic forecasting for store placement and network planning.
  • Risk & insurance — Modeling driver behavior at scale.

The core value proposition: when you build a model that generalizes across regions, tasks, and datasets, you shrink cost while expanding applicability. That’s exactly what trajectory foundation models aim for.

And as with every foundation-model wave, early movers gain the strongest moat.

Conclusion — From waypoints to world models

This tutorial may be modest in length, but its implications are not. GPT‑style models are drifting into the physical world, and mobility is one of the most promising domains for real commercial impact. The authors’ step-by-step reconstruction of a trajectory-focused foundation model demystifies an area that has long been opaque.

If “LLM for movement” sounds esoteric today, give it a few years. Movement is data—and data becomes models.

Cognaptus: Automate the Present, Incubate the Future.