RAG | Cognaptus

SD‑RAG: Don’t Trust the Model, Trust the Pipeline

A chatbot should not be the only employee in the company responsible for keeping secrets. That sounds obvious until we look at how many enterprise RAG systems are designed. A user asks a question. The system retrieves internal documents. The documents are placed into the model context. A policy instruction is added somewhere above the user prompt: do not reveal sensitive information. Then everyone hopes the model behaves. ...

Aligned or Just Agreeable? Why Accuracy Is a Terrible Proxy for AI–Human Alignment

Accuracy is comforting because it gives us a number. The model predicted the right label. The chatbot chose the same option as the survey respondent. The simulated customer picked the same product. Everyone claps, someone updates a dashboard, and the alignment problem is declared mostly solved. Unfortunately, decision-making is where accuracy goes to look respectable while quietly doing very little. ...

When Models Read Too Much: Context Windows, Capacity, and the Illusion of Infinite Attention

The demo is familiar now. Someone drops a whole contract, a whole policy manual, a whole code repository, or a month of chat history into a model and asks one neat question. The model answers fluently. The room relaxes. The slide says “1M-token context.” Procurement starts smiling. This is where the trouble begins. ...

When Memory Stops Guessing: Stitching Intent Back into Agent Memory

Memory fails in a very ordinary way. A customer asks, “Can we use the same approval condition as before?” A research agent says, “Yes.” A procurement assistant retrieves the old vendor quote. A planning copilot remembers a hotel price from yesterday’s itinerary. Everything looks semantically relevant. The words match. The entities match. The embedding score smiles politely. ...

Bubble Trouble: Why Top‑K Retrieval Keeps Letting LLMs Down

The problem is not finding documents. It is spending the prompt budget badly. Ask an enterprise RAG system for “scope of work,” and the system may look confident for exactly the wrong reason. The query sounds simple. Somewhere in the document set, there is probably a sheet, paragraph, or clause literally called “Scope of Works.” A flat top-k retriever will happily grab the highest-scoring chunks from that section, stack them into the model context, and call the job done. Very tidy. Very wrong. ...

EvoFSM: Teaching AI Agents to Evolve Without Losing Their Minds

Workflow is the unglamorous part of agentic AI. Which is precisely why it matters. A research agent can have a strong language model, a decent search tool, and an impressive ability to produce paragraphs that sound like a McKinsey intern who drank too much espresso. Yet when the task becomes long, ambiguous, and evidence-heavy, the same agent often fails for a boring reason: it does the right actions in the wrong order, repeats the same weak search, summarizes too early, forgets to verify a source, or changes its own instructions so enthusiastically that it becomes a different employee halfway through the job. ...

Trading Without Cheating: Teaching LLMs to Reason When Markets Lie

Trade has a special talent for humiliating clean theories. A model reads a market brief. It sees earnings beats, sales guidance, analyst upgrades, and a few scattered corporate events. Asked to behave like a turnaround specialist, it starts building buy signals. Some recommendations are reasonable. Others quietly smuggle in missing assumptions: maybe the company has new management; maybe the earnings beat reflects restructuring; maybe debt reduction is happening somewhere behind the curtain. Very elegant. Also, very convenient. ...

MAGMA Gets a Memory: Why Flat Retrieval Is No Longer Enough

Memory is where many impressive agents quietly become mediocre employees. They can answer the last question. They can summarize the last document. They can sound very confident about a customer, a project, or a workflow they saw three weeks ago. Then someone asks, “Why did we make that decision?”, “When did the requirement change?”, or “Was that the same client who objected last time?” Suddenly the agent rummages through its past like a consultant searching Slack at 1:43 a.m. Technically alive. Not exactly organized. ...

EverMemOS: When Memory Stops Being a Junk Drawer

Memory sounds simple until the assistant has to remember two incompatible things at once. A customer loves craft beer. The same customer is temporarily taking antibiotics. A flat memory system retrieves “likes IPA” and recommends a variety pack, because apparently “memory” means grabbing the loudest sticky note from a drawer and pretending it is wisdom. A more useful assistant retrieves the preference, the medical constraint, the timing, and the relation among them. It recommends a mocktail and quietly avoids turning personalization into negligence. ...

When LLMs Stop Guessing and Start Complying: Agentic Neuro-Symbolic Programming

The problem is not that LLMs cannot write code. It is that they write the wrong kind too confidently. A familiar scene: someone gives an LLM a task, receives a block of code that looks elegant, runs it, and discovers that it has invented an API, misunderstood the library, or solved a neighboring problem with excellent grammar. This is annoying when the target is ordinary Python. It is worse when the target is a specialized framework where the code is supposed to encode logic, constraints, and domain structure. ...