Cover image

Confidence Is Not Truth, But It Can Steer: When LLMs Learn When to Stop

Opening — Why this matters now Large Language Models are no longer compute-bound at training time. They are inference-bound at deployment time. The last year has made this painfully clear. Frontier reasoning models increasingly win benchmarks not by being smarter, but by thinking more: longer chains-of-thought, more samples, more retries, more votes. The result is an arms race in test-time scaling—512 samples here, best-of-20 there—where accuracy inches upward while token bills explode. ...

February 10, 2026 · 4 min · Zelina
Cover image

CompactRAG: When Multi-Hop Reasoning Stops Burning Tokens

Opening — Why this matters now Multi-hop reasoning has quietly become one of the most expensive habits in modern AI systems. Every additional hop—every “and then what?”—typically triggers another retrieval, another prompt expansion, another LLM call. Accuracy improves, yes, but so does the bill. CompactRAG enters this conversation with a refreshingly unfashionable claim: most of this cost is structural, not inevitable. If you stop forcing LLMs to repeatedly reread the same knowledge, multi-hop reasoning does not have to scale linearly in tokens—or in money. ...

February 8, 2026 · 3 min · Zelina
Cover image

Freeze Now, Learn Faster: When Parameter Freezing Meets Pipeline Reality

Opening — Why this matters now Training large language models has quietly shifted from an optimization problem into a scheduling problem. As model sizes balloon and GPU clusters grow deeper rather than wider, pipeline parallelism has become unavoidable. Yet most efficiency tricks—parameter freezing included—still behave as if time does not exist. This paper introduces TimelyFreeze, a system-level rethink of parameter freezing that aligns what we freeze with when computation actually happens. Instead of blindly freezing layers based on gradient statistics or heuristics, TimelyFreeze asks a more practical question: which parameters are on the critical path right now? ...

February 8, 2026 · 3 min · Zelina
Cover image

Ultra‑Sparse Embeddings Without Apology

Opening — Why this matters now Embeddings have quietly become the metabolic system of modern AI. Every retrieval query, recommendation list, and ranking pipeline depends on them—yet we keep feeding these systems increasingly obese vectors. Thousands of dimensions, dense everywhere, expensive always. The paper behind CSRv2 arrives with an unfashionable claim: you can make embeddings extremely sparse and still win. ...

February 8, 2026 · 3 min · Zelina
Cover image

Thinking Isn’t Free: Why Chain-of-Thought Hits a Hard Wall

Opening — Why this matters now Inference-time reasoning has quietly become the dominant performance lever for frontier language models. When benchmarks get hard, we don’t retrain—we let models think longer. More tokens, more scratchpad, more compute. The industry narrative is simple: reasoning scales, so accuracy scales. This paper asks an uncomfortable question: how long must a model think, at minimum, as problems grow? And the answer, grounded in theory rather than vibes, is not encouraging. ...

February 5, 2026 · 3 min · Zelina
Cover image

Thinking in Panels: Why Comics Might Beat Video for Multimodal Reasoning

Opening — Why this matters now Multimodal reasoning has quietly hit an efficiency wall. We taught models to think step by step with text, then asked them to imagine with images, and finally to reason with videos. Each step added expressive power—and cost. Images freeze time. Videos drown signal in redundancy. Somewhere between the two, reasoning gets expensive fast. ...

February 3, 2026 · 3 min · Zelina
Cover image

FadeMem: When AI Learns to Forget on Purpose

Opening — Why this matters now The race to build smarter AI agents has mostly followed one instinct: remember more. Bigger context windows. Larger vector stores. Ever-growing retrieval pipelines. Yet as agents move from demos to long-running systems—handling days or weeks of interaction—this instinct is starting to crack. More memory does not automatically mean better reasoning. In practice, it often means clutter, contradictions, and degraded performance. Humans solved this problem long ago, not by remembering everything, but by forgetting strategically. ...

February 1, 2026 · 4 min · Zelina
Cover image

Gated Sparse Attention: Speed Without the Sink

Opening — Why this matters now Long-context language models have crossed an uncomfortable threshold. Context windows now stretch to 128K tokens and beyond, yet the core attention mechanism still scales quadratically. The result is a growing mismatch between what models can theoretically ingest and what is economically and operationally feasible. At the same time, training instability — loss spikes, attention sinks, brittle gradients — continues to haunt large-scale runs. ...

January 24, 2026 · 4 min · Zelina
Cover image

Skeletons in the Proof Closet: When Lean Provers Need Hints, Not More Compute

Opening — Why this matters now Neural theorem proving has entered its industrial phase. With reinforcement learning pipelines, synthetic data factories, and search budgets that would make a chess engine blush, models like DeepSeek‑Prover‑V1.5 are widely assumed to have internalized everything there is to know about formal proof structure. This paper politely disagrees. Under tight inference budgets—no massive tree search, no thousand-sample hail‑Mary—the author shows that simple, almost embarrassingly old‑fashioned structural hints still deliver large gains. Not new models. Not more data. Just better scaffolding. ...

January 23, 2026 · 4 min · Zelina
Cover image

When Benchmarks Break: Why Bigger Models Keep Winning (and What That Costs You)

Opening — Why this matters now Every few months, a new paper reassures us that bigger is better. Higher scores, broader capabilities, smoother demos. Yet operators quietly notice something else: rising inference bills, brittle behavior off-benchmark, and evaluation metrics that feel increasingly ceremonial. This paper arrives right on schedule—technically rigorous, empirically dense, and unintentionally revealing about where the industry’s incentives now point. ...

January 21, 2026 · 3 min · Zelina