LLMs | Cognaptus

When AI Packs Too Much Hype: Reassessing LLM 'Discoveries' in Bin Packing

A warehouse manager, a cloud scheduler, and a container-ship planner all know the same unpleasant truth: fitting things into limited capacity is where tidy strategy goes to die. That is why bin packing remains such a useful test case. The problem is easy to explain and difficult to solve optimally. Items arrive. Bins have fixed capacity. The objective is to use as few bins as possible. In the online version, the system must decide where to place each item as it arrives, without seeing the future. This is not just a toy puzzle. It resembles production scheduling, memory allocation, server placement, freight consolidation, and every other operational setting where tomorrow’s workload has the bad manners not to disclose itself in advance. ...

Two Minds in One Machine: How Agentic AI Splits—and Reunites—the Field

Agents have become the new office intern, software engineer, analyst, compliance assistant, and occasional disaster rehearsal all in one. Give one a goal, some tools, a memory store, and permission to act, and it begins to look less like a chatbot and more like a small operating unit. That is the sales pitch. The engineering reality is less tidy. ...

The Rise of FreePhD: How Multiagent Systems are Reimagining the Scientific Method

A broken file link is not usually where scientific revolutions begin. It is, however, where many automated workflows die. That is why the most revealing moment in the freephdlabor paper is not the grand claim about personalised AI research groups. It is the rather unromantic episode where the system tries to write a paper, discovers that the experiment data are missing because of a failed symlink, attempts workarounds, fails validation, reports the failure, gets routed back through resource preparation, rebuilds the workspace correctly, and only then proceeds to manuscript generation.1 ...

Promptfolios: When Buffett Becomes a System Prompt

Investment firms love a house style. Conservative value. Quality growth. Distressed credit. Low-volatility income. The style is supposed to mean something more durable than a portfolio manager’s breakfast mood. The uncomfortable part is that many “styles” still live in a fog of analyst judgement, committee memory, spreadsheet folklore, and the occasional sacred quote from an investor whose annual letters have been read with the reverence normally reserved for scripture. Everyone claims discipline. Fewer can show exactly how that discipline becomes position weights. ...

Branching Out of the Box: Tree‑OPO Turns MCTS Traces into Better RL for Reasoning

Branching Out of the Box: Tree-OPO Turns MCTS Traces into Better RL for Reasoning A search tree is expensive to build. Once you have paid for it, using only the final answers is a little like buying an aircraft engine and admiring the packaging. That is the useful instinct behind Tree-OPO, a paper that asks whether Monte Carlo Tree Search traces from a stronger teacher model can be reused not merely as demonstrations, but as a structured curriculum for training a smaller reasoning policy.1 The idea is not to run MCTS at inference time and call that progress. Nor is it to imitate a teacher’s logits until the student develops the personality of a photocopier. The paper’s more interesting move is subtler: take the partial reasoning states produced by search, let the student complete from those prefixes, and compute advantages in a way that respects where each prefix sits in the tree. ...

Hook, Line, and Import: How RAG Lets Attackers Snare Your Code

Imports look harmless until they become procurement. A developer asks an AI assistant for a plotting snippet. The assistant returns clean-looking Python, a few lines of explanation, and an import statement for matplotlib_safe. The name sounds prudent. Safer is good. Safer is what the security team keeps asking for, usually in meetings that could have been static analysis. ...

Plan, Then Rewrite: Why Explicit Intent Wins in Agent Workflows

A user starts by asking for Italian restaurants, answers a few clarification questions, then changes their mind and asks for Mexican instead. A human hears the reversal. A planner may hear: pizza, pasta, Italian, Mexican, recommendations, and perhaps a vague invitation to overachieve. Naturally, it may then produce a plan with the confidence of a consultant who attended only half the meeting. ...

Brains Meet Brains: When LLMs Sit on Top of Supply Chain Optimizers

TL;DR for operators The paper is useful because it gets the hierarchy right: the optimizer decides; the LLM explains, configures, contextualizes, and packages the decision for humans.1 That is not a small distinction. It is the difference between a supply chain system that can be audited and a chatbot confidently waving at a warehouse. ...

Faking It to Make It: When Synthetic Data Actually Works

TL;DR for operators Synthetic data is not magic fake data that politely becomes real after a procurement cycle. It is a set of techniques for generating artificial records that imitate useful properties of real datasets, and its value depends on what bottleneck you are trying to remove. Li et al.’s tutorial proposal, Generative Models for Synthetic Data: Transforming Data Mining in the GenAI Era, is best read as a map of the modern synthetic-data stack: GANs, diffusion models, and LLMs; text, tabular, graph, sequential, visual, and multimodal data; evaluation criteria; and practical deployment settings in health, finance, and education.1 It is not a benchmark paper. It does not run a new experiment showing that synthetic data improves business outcomes by some conveniently rounded percentage. That is inconvenient, but also useful. The paper is trying to organise the field, not sell a miracle. ...

Memory With Intent: Why LLMs Need a Cognitive Workspace, Not Just a Bigger Window

TL;DR for operators Most enterprise LLM failures do not come from the model “not knowing enough”. They come from the system forgetting what it was doing five minutes ago, rediscovering the same facts, and treating every user turn as a fresh episode in a soap opera nobody asked to watch. The paper behind this article proposes Cognitive Workspace: an active memory architecture for LLMs that deliberately curates, reuses, consolidates, and forgets information rather than merely retrieving chunks or stretching the context window.1 Its core claim is simple but consequential: useful long-context behaviour is not the same as having a long context window. It is the ability to maintain a working state across a task. ...