Human-in-the-Loop

From Permit Ping-Pong to Governed Case Flow

A mid-sized municipal permit office redesigns fragmented intake, regulatory search, departmental routing, clarification, inspection scheduling, and decision documentation around six specialized agents while preserving human authority over interpretation and adjudication.

Thirteen Buckets and a Warning Light

TL;DR for operators Most organizations do not have a feedback shortage. They have a conversion problem. Comments arrive in multiple languages, contain personal information, mix several complaints in one paragraph, and rarely align themselves politely with the categories used in management reports. Human teams then compress this material into a few operational labels, often under time pressure and with rules inherited from whichever spreadsheet survived the last reorganization. ...

From Memory-Based Coordination to Controlled Case Orchestration

A metropolitan funeral home redesigns a fragile, coordinator-dependent case process around specialized AI agents while retaining human control over every family, legal, ceremonial, and religious decision.

From Inbox Chasing to Controlled Port-Call Orchestration

A medium-sized ship agency redesigned its port-call workflow around a shared event record and six specialist agents, shifting coordinators from manual information chasing to evidence-based exception management without automating regulatory, financial, provider, or safety-critical authority.

Judge, Jury, and Calibration: Why AI Evaluation Needs Anchors

TL;DR for operators AI is becoming very good at producing judgement-shaped output. That is not the same thing as judgement. Two recent papers make the same operational point from different sides: one shows how AI can estimate educational item difficulty before response data are available; the other shows how LLM-generated peer reviews can look serious while diverging from human reviewing behaviour.12 ...

Feedback, Not Freefall: Why LLM Writing Tools Need a Teacher in the Loop

Feedback is expensive. Anyone who has managed a classroom, a content team, a training programme, or a junior analyst cohort knows the pattern. The first draft is rarely the problem. The problem is the second draft, because the second draft requires specific feedback, delivered in language the learner can act on, without exhausting the person giving it. Multiply that by thirty students, ten assignments, uneven ability levels, and a calendar that refuses to become more generous. Suddenly “just give everyone personalised feedback” becomes one of those ideas beloved by people who do not have to do it. ...

Synthesize, but Verify: The Data Flywheel Behind Useful AI Automation

Opening — Why this matters now The easiest AI demo in the world is a model producing something plausible. A product description. A support reply. A defect image. A peer-review report. A compliance explanation. A benchmark answer. The output looks competent enough to be shown in a slide deck, which is often where corporate AI strategy goes to enjoy a short but well-lit life. ...

From Gate Noise to Turnaround Intelligence: AI Agents for Airline Ground Operations

A regional airline or ground-handling team moved from scattered radio, chat, and checklist updates to a human-reviewed AI coordination layer that tracks turnaround state, detects exceptions, drafts delay explanations, and improves passenger communication.

When the Judge Needs Judging: LLM Evaluators Under Cross-Examination

The dashboard says the judge is fine. The document disagrees. Judge is an easy word to trust. It suggests robes, procedure, and someone in the room who is supposed to be less confused than everyone else. In AI evaluation, the word has become dangerously comfortable. Product teams now use LLMs to score summaries, rank chatbot answers, approve RAG outputs, compare model releases, and decide whether another model’s response is “good enough.” The attraction is obvious: human review is expensive, slow, and occasionally insists on context. An LLM judge is fast, scalable, and does not ask why the evaluation rubric was written five minutes before the sprint review. ...

From Scattered Site Logs to Safety Intelligence: AI Mining Site Safety & Reporting Agent

A remote-site mining operator redesigned its safety reporting workflow from manual record chasing into an agent-assisted process that consolidates field evidence, surfaces risks, drafts reports, and preserves human approval for safety-critical decisions.