Cognaptus Insights

Reviewer, Reviewed: When AI Starts Grading the Graders

A field deployment of AI-generated peer review at AAAI-26 shows where AI can outperform human reviewers, where it still fails, and what businesses should learn about governed second-opinion systems.

Rewarding Bad Physics Habits: What VLMs Learn When You Pay Them to Reason

A reward-ablation study on VLM physics reasoning shows why accuracy, reasoning discipline, and visual grounding must be treated as different deployment objectives, not one magical intelligence switch.

Trex Marks the Spot: When AI Starts Training AI

A mechanism-first reading of TREX, an agent system that treats LLM fine-tuning as an iterative research workflow rather than a glorified hyperparameter search.

When Maps Start Thinking: GeoAgentBench and the Audit of Spatial AI

GeoAgentBench shows why serious spatial AI must be tested by execution, parameter discipline, and final map verification—not by how convincingly an agent describes a workflow.

Benchmarking the Benchmarks: When AI Safety Metrics Stop Meaning Anything

A sharper reading of AISafetyBenchExplorer, showing why AI safety evaluation now suffers less from benchmark scarcity than from metric drift, stale infrastructure, and weak benchmark governance.

Evolve or Die Trying: When LLMs Stop Writing Code and Start Designing Algorithms

BEAM shows that useful LLM algorithm design is less about clever prompting and more about structured search, reusable memory, and evaluation that actually resembles solver construction.

From Words to Workflows: Why AI Still Struggles to Think Like an Operations Research Analyst

A close reading of Text2Model shows why LLMs can draft optimization models, but still need validation layers before they can be trusted in business decision workflows.

Learning on Autopilot? Not Quite — How PAL Turns Passive Videos into Active Intelligence

A mechanism-first reading of PAL, an AI learning platform that turns lecture videos into adaptive questioning, learner-state tracking, and personalized post-lesson reinforcement.

Routing Without Running Out: How Bilevel Optimization Rewires EV Logistics

A mechanism-first reading of how bilevel optimization makes electric vehicle routing more scalable by using routing cost as a cheap but imperfect guide.

The Memory Isn’t Broken — It’s Flat: Why LLMs Need to ‘Draw’ to Remember

A mechanism-first reading of dual-trace memory encoding and why enterprise AI agents may need richer contextual traces, not just larger memory stores.