Cover image

OpenSeeker: Breaking the Search Monopoly (One Dataset at a Time)

Search is now where many AI demos go to become either useful products or expensive browser cosplay. A model that answers from memory can look impressive for five minutes. A model that can search, compare, verify, follow clues, abandon bad paths, and synthesize a final answer is much harder to fake. That is why “deep research” has become one of the more important capability battles in AI. It is also why the battle has been awkwardly closed. Many labs release weights, leaderboards, and cinematic launch posts. Far fewer release the thing that actually teaches the agent how to search: the training data. ...

March 17, 2026 · 18 min · Zelina
Cover image

Who Owns Your Words? Copyright, LLMs, and the Quiet Arms Race Over Training Data

The new copyright question is not “did the model copy me?” but “how would I know?” A writer uploads a chapter. A publisher uploads a manuscript. A compliance team uploads a protected document. The question is simple enough to ask in one sentence: did this material end up inside a large language model’s training data? ...

November 26, 2025 · 17 min · Zelina
Cover image

What LLMs Remember—and Why: Unpacking the Entropy-Memorization Law

TL;DR for operators Memorization audits usually start with the wrong question: “Which individual text snippets look memorized?” This paper suggests a better first diagnostic: group many snippets by how closely the model reproduces them, then measure the entropy of the token distribution inside each group.1 The result is an empirical pattern the authors call Entropy–Memorization Linearity. In plain English: when training examples are pooled by edit-distance score, their set-level entropy forms a strong linear relationship with how closely the model reproduces them. Since the paper’s “memorization score” is an edit distance, lower score means stronger verbatim reproduction; higher score means the generated continuation is farther from the ground truth. ...

July 13, 2025 · 15 min · Zelina