AI Automation

Memory Diet for AI Agents: Distilling Conversations Without Forgetting

Memory has become the awkward invoice attached to every serious AI agent demo. A short chatbot can survive on vibes. A long-running coding assistant cannot. After a few weeks of debugging sessions, architecture debates, config changes, rejected fixes, and “remember we tried this already?” moments, the agent’s past becomes valuable. It also becomes inconveniently large. The obvious solution is to stuff more transcript into the prompt. The obvious solution is usually how software gets expensive before it gets useful. ...

Same Question, Different Words — Why LLM Agents Lose Their Minds

Users do not ask questions in benchmark format. They ask in fragments, emails, forms, meeting notes, support tickets, spreadsheet comments, and occasionally in the sort of sentence that makes a compliance officer stare silently at the ceiling. A business AI agent does not receive one clean canonical prompt. It receives the same task wearing many costumes. ...

Agents in Lab Coats: When LLMs Try to Become Data Scientists

Spreadsheet first. Not the model. Not the agent. Not the impressive diagram with seven tiny boxes labeled “planner,” “executor,” “critic,” “memory,” “tool user,” “reflection,” and, inevitably, “orchestrator.” In most companies, data science automation begins with something less glamorous: a messy spreadsheet, a half-documented database table, a recurring report, a manager asking why last month’s number changed, and one unlucky analyst trying to remember whether “customer_id” means account, user, buyer, household, or whatever the CRM vendor believed in 2019. ...

Don’t Prompt Harder — Engineer Smarter: Inside CEDAR’s Agentic Data Scientist

Dataset. That is where many “AI data scientist” demos quietly stop being impressive. A tidy CSV, a small notebook, a polite prompt, and a model that produces a confident answer: this is enough for a video clip. It is not enough for data science. Real data science is not a single question answered by a single model response. It is a sequence of choices: load this file, inspect these columns, define this metric, split the data this way, train this baseline, handle this error, explain this plot, revise the next step. ...

From PDE to Pipeline: When LLMs Become Numerical Architects

Simulation has an awkward little secret: the hard part is often not writing code. It is choosing the right numerical method before the code exists. Anyone can ask an LLM to produce a solver for an advection equation, a heat equation, or a Navier–Stokes toy problem. The result may even run. That is not the same as being numerically sane. A PDE solver can be syntactically valid, computationally impressive, and mathematically ridiculous at the same time. In scientific computing, this is not a charming personality flaw. It is how bad answers acquire nice plots. ...

From Pixels to Patterns: Teaching LLMs to Read Physics

Logs are useful until they become a landfill. Every serious automation system eventually produces the same awkward artifact: a long trace of what happened. A machine moved here. A sensor changed there. An object collided, rolled, paused, reversed, bounced, touched something else, and then the system reached—or failed to reach—the desired state. In principle, this trace contains the answer. In practice, it is the kind of answer that makes a language model stare at 5,000 tokens of coordinates and politely hallucinate a story. ...

Click Like a Human: Why Avenir-Web Is a Quiet Breakthrough in Web Agents

Click. That is where most web-agent demos become either impressive or mildly tragic. The model reads the instruction, understands the goal, produces a confident plan, and then clicks the wrong thing. Or it clicks the right thing before a modal appears. Or it scrolls, forgets why it scrolled, repeats an action, and quietly turns a three-step workflow into interpretive dance. ...

When LLMs Read the Room: Predictive Process Monitoring Without the Data Buffet

Back office teams rarely suffer from a shortage of opinions. They suffer from a shortage of completed cases. A bank wants to know whether a loan application will require costly rework. A hospital wants to know whether an emergency-department case will need laboratory processing. An operations manager wants to know how long a running case will take before it becomes tomorrow’s apology email. Predictive Process Monitoring, or PPM, is supposed to help with exactly this kind of question. It looks at event logs and predicts what will happen next: total completion time, future activities, process outcomes, delays, exceptions. ...

When Graphs Stop Guessing: Teaching Models to Rewrite Their Own Meaning

Customer networks are messy. Product graphs are messy. Fraud rings are messy. Supply-chain graphs are messy. The usual engineering reflex is also messy: when the graph model disappoints, add another architecture, another positional encoding, another “graph-aware” module, another clever acronym to the pile. The paper Semantic Refinement with LLMs for Graph Representations suggests a quieter alternative: before changing the model, change what the model is asked to read.1 ...

Ports, But Make Them Agentic: When LLMs Start Running the Yard

Ports are already full of automation. Cranes move containers, AGVs follow routes, software coordinates flows, dashboards blink reassuringly at managers who are paid to pretend that blinking equals control. Then one terminal changes its layout, closes a road, adds a vehicle restriction, or introduces a new safety corridor. Suddenly the “automated” dispatching system needs engineers, operations researchers, domain experts, test scripts, model reformulation, solver debugging, and several meetings where everyone discovers that “just adjust the rule” was not, in fact, just. ...