Formal Methods

Typechecked and Still Wrong

TL;DR for operators The useful lesson from this paper is not “AI can formalize mathematics better.” That is the shiny wrapper. The operational lesson is nastier and more important: an AI-generated formal artifact can pass syntactic checks, be provable, and still fail to represent the original human intent. The type checker is not a mind reader. It is a very disciplined bureaucrat. ...

The Proof Is in the Instance: Why AI Safety Can’t Be Fully Verified

The verifier that cannot know everything Verification sounds like the sensible adult in the AI safety room. The model may hallucinate, the benchmark may flatter, the demo may sparkle under conference lighting, but the verifier is supposed to be the hard stop: a formal mechanism that checks whether an AI system’s behavior satisfies a specified policy. ...

From One Shot to Many: Why AI Should Stop Guessing and Start Exploring

From One Shot to Many: Why AI Should Stop Guessing and Start Exploring One answer is tidy. One answer is easy to grade. One answer also happens to be a strangely fragile way to use AI. That is not just a philosophical complaint about creativity, brainstorming, or whether a chatbot sounds confident enough while being quietly wrong. It becomes a technical problem when AI systems generate artifacts that other systems must consume: code, formal specifications, compliance rules, database transformations, contracts, workflows, or mathematical statements. In those settings, the generated object is not merely a sentence. It is an interface. ...

When Temperature Rises, Who’s to Blame? — Causation in Hybrid Worlds

Temperature is a patient witness. A valve ruptures. A cooling system fails. A technician records a radiation reading. Minutes later, the core temperature crosses a danger threshold. The incident report now asks the question every system audit eventually asks, usually after everyone has already chosen a favorite suspect: Who caused the temperature rise? ...

When Goals Collide: Synthesizing the Best Possible Outcome

A robot does not always get the luxury of a clean task list. Reach the loading bay. Avoid blocked corridors. Preserve battery. Pick up two packages. Respect a safety boundary. Finish before the door closes. Then the environment, as environments enjoy doing, changes the rules halfway through. A corridor shuts. A resource disappears. One goal now interferes with another. ...

AI Writes the Rules: When Formal Logic Teaches Language Discipline

A requirement can survive three meetings, two approvals, and a legal review while still meaning different things to everyone who reads it. That is not usually because anyone is careless. Natural language is simply very good at sounding settled before its meaning is settled. Words such as “after,” “until,” “immediately,” and “within” feel precise in conversation. In software requirements, they can quietly conceal incompatible assumptions about timing, cancellation, and acceptable system behavior. ...

TOGGLE or Die Trying: Giving LLM Compression a Spine

Compression needs a rulebook, not just a diet plan Compression is the least glamorous part of the LLM business until the bill arrives. A model works beautifully in a cloud demo. Then someone asks whether it can run on a device with limited memory, limited energy, limited connectivity, and limited patience. Suddenly the elegant system becomes a logistics problem. Quantize it. Prune it. Shrink it. Hope it still speaks like the original model and not like a sleep-deprived intern summarizing a legal contract from memory. ...

When AI Reviews AI: Turning Foundation Models into Safety Inspectors

Inspection is not glamorous. It is not the robot demo, not the dashboard, not the moment a prototype obediently follows a traffic cone across a test track. Inspection is the slow, expensive discipline of asking whether the thing that worked once will behave acceptably when the weather changes, the path bends, the sensor gets confused, or the requirement was written by a tired engineer using the phrase “successfully complete” as if English were a formal language. ...

The Latent Truth: Why Prototype Explanations Need a Reality Check

The Latent Truth: Why Prototype Explanations Need a Reality Check Audit starts with a simple request: show me why. For prototype-based neural networks, that request has always had a pleasantly visual answer. The model points to a learned prototype from training data and says, in effect, “this part of the image looks like that part of an example I already know.” This is the interpretability sales pitch in its most charming form. No opaque wall of logits. No post-hoc heatmap pretending to be a confession. Just a case-based explanation: this resembles that. ...

Logic With a View: When Standpoints Meet Non‑Monotonicity

Decisions Rarely Fail Because Everyone Disagrees Businesses are quite used to disagreement. Risk says no, growth says yes, legal says “only if we phrase it carefully,” and compliance brings a spreadsheet that somehow makes everyone sad. The hard part is not that these groups disagree. The hard part is that they often disagree using partly shared language. “Eligible,” “material,” “reasonable,” “high risk,” “recommended,” and “approved” may look like one vocabulary. In practice, they are local dialects wearing corporate badges. ...