Cover image

When Words Start Walking: Rethinking Semantic Search Beyond Averages

Opening — Why this matters now Search systems have grown fluent, but not necessarily intelligent. As enterprises drown in text—contracts, filings, emails, reports—the gap between what users mean and what systems match has become painfully visible. Keyword search still dominates operational systems, while embedding-based similarity often settles for crude averages. This paper challenges that quiet compromise. ...

February 8, 2026 · 3 min · Zelina
Cover image

Benchmarks Lie, Rooms Don’t: Why Embodied AI Fails the Moment It Enters Your House

Opening — Why this matters now Embodied AI is having its deployment moment. Robots are promised for homes, agents for physical spaces, and multimodal models are marketed as finally “understanding” the real world. Yet most of these claims rest on benchmarks designed far away from kitchens, hallways, mirrors, and cluttered tables. This paper makes an uncomfortable point: if you evaluate agents inside the environments they will actually operate in, much of that apparent intelligence collapses. ...

February 7, 2026 · 4 min · Zelina
Cover image

Beyond Cosine: When Order Beats Angle in Embedding Similarity

Opening — Why this matters now Cosine similarity has enjoyed an unusually long reign. From TF‑IDF vectors to transformer embeddings, it remains the default lens through which we judge “semantic closeness.” Yet the more expressive our embedding models become, the more uncomfortable this default starts to feel. If modern representations are nonlinear, anisotropic, and structurally rich, why are we still evaluating them with a metric that only understands angles? ...

February 7, 2026 · 4 min · Zelina
Cover image

First Proofs, No Training Wheels

Opening — Why this matters now AI models are now fluent in contest math, symbolic manipulation, and polished explanations. That’s the easy part. The harder question—the one that actually matters for science—is whether these systems can do research when the answer is not already in the training set. The paper First Proof arrives as a deliberately uncomfortable experiment: ten genuine research-level mathematics questions, all solved by humans, none previously public, and all temporarily withheld from the internet. ...

February 7, 2026 · 3 min · Zelina
Cover image

Hallucination-Resistant Security Planning: When LLMs Learn to Say No

Opening — Why this matters now Security teams are being asked to do more with less, while the attack surface keeps expanding and adversaries automate faster than defenders. Large language models promise relief: summarize logs, suggest response actions, even draft incident playbooks. But there’s a catch that every practitioner already knows—LLMs are confident liars. In security operations, a hallucinated action isn’t just embarrassing; it’s operationally expensive. ...

February 7, 2026 · 4 min · Zelina
Cover image

When AI Forgets on Purpose: Why Memorization Is the Real Bottleneck

Opening — Why this matters now Large language models are getting bigger, slower, and—paradoxically—more forgetful in all the wrong places. Despite trillion‑token training runs, practitioners still complain about brittle reasoning, hallucinated facts, and sudden regressions after fine‑tuning. The paper behind this article argues that the problem is not insufficient memory, but poorly allocated memory. ...

February 7, 2026 · 3 min · Zelina
Cover image

When One Heatmap Isn’t Enough: Layered XAI for Brain Tumour Detection

Opening — Why this matters now Medical AI is no longer struggling with accuracy. In constrained tasks like MRI-based brain tumour detection, convolutional neural networks routinely cross the 90% mark. The real bottleneck has shifted elsewhere: trust. When an algorithm flags—or misses—a tumour, clinicians want to know why. And increasingly, a single colourful heatmap is not enough. ...

February 7, 2026 · 3 min · Zelina
Cover image

When RAG Needs Provenance, Not Just Recall: Traceable Answers Across Fragmented Knowledge

Opening — Why this matters now RAG is supposed to make large language models safer. Ground the model in documents, add citations, and hallucinations politely leave the room—or so the story goes. In practice, especially in expert domains, RAG often fails in a quieter, more dangerous way: it retrieves something relevant, but not the right kind of evidence. ...

February 7, 2026 · 4 min · Zelina
Cover image

AgenticPay: When LLMs Start Haggling for a Living

Opening — Why this matters now Agentic AI has moved beyond polite conversation. Increasingly, we expect language models to act: negotiate contracts, procure services, choose suppliers, and close deals on our behalf. This shift quietly transforms LLMs from passive tools into economic actors. Yet here’s the uncomfortable truth: most evaluations of LLM agents still resemble logic puzzles or toy auctions. They test reasoning, not commerce. Real markets are messy—private constraints, asymmetric incentives, multi-round bargaining, and strategic patience all matter. The paper behind AgenticPay steps directly into this gap. ...

February 6, 2026 · 4 min · Zelina
Cover image

Quantum Routes, Real Gains: When Transformers Meet CVRP

Opening — Why this matters now Routing problems are the unglamorous backbone of modern logistics. Every e‑commerce delivery, warehouse dispatch, and last‑mile optimization problem eventually collapses into some variant of the Capacitated Vehicle Routing Problem (CVRP). It is also, inconveniently, NP‑hard. Classical heuristics scale. Deep learning brings adaptability. Quantum computing promises expressivity. The uncomfortable question is whether these promises stack—or cancel each other out. ...

February 6, 2026 · 4 min · Zelina