Choosing Wisely: How MACHOP Turns Logic Puzzles into Preference Machines

A schedule looks reasonable until someone asks why.

Why did this nurse get the night shift? Why was this invoice routed for manual review? Why did the configuration engine reject one product bundle and approve another? In many operational systems, the answer is not a single rule. It is a chain of constraints: availability, capacity, dependencies, exclusions, thresholds, and the occasional policy clause someone wrote in 2017 and nobody wants to touch.

That is why explanations in constraint systems are not just decorative transparency. They are how users decide whether automation deserves to keep its hands on the steering wheel.

The paper behind MACHOP starts from a modest setting: Sudoku and logic-grid puzzles. Sensible readers may therefore be tempted to file it under “cute benchmark, limited relevance.” That would be premature. Logic puzzles are not the business application. They are the laboratory cage. The useful question is what happens when an AI system must choose which explanation to show a human, knowing that many logically valid explanations exist and only some of them will feel understandable.¹

The answer is not “choose the shortest one.” That is the trap. Short explanations can be elegant; they can also be cryptic little bricks. MACHOP’s contribution is to treat explanation selection as a preference-learning problem: show users pairs of explanation steps, learn what they prefer, and use that learned objective to generate better future explanations.

The paper’s business relevance sits inside the mechanism, not merely inside the benchmark scores. Ordinary preference elicitation struggles because it asks bad comparison questions, learns from features on unstable scales, and sometimes makes users wait while the solver thinks very deeply about a puzzle square. MACHOP is interesting because it attacks those three mundane problems. As usual, the mundane problems are where products go to die.

The shortest explanation is not automatically the best explanation

In constraint programming, a system works with variables, domains, and constraints. A Sudoku cell, a nurse assignment, a delivery slot, or a product configuration can all be framed this way. A solution satisfies the constraints. An explanation step justifies why a particular fact follows from the current known facts and constraints.

For example, a Sudoku system may explain why a cell must contain a 6. One explanation might use four facts and three constraints. Another might use eight facts and one constraint. The first is shorter. The second may be easier for a human to follow because it relies on a more familiar pattern.

This is the paper’s central correction. Cardinality-minimal explanations, called smallest explanation steps or SES in the paper, are reasonable baselines. They are not a theory of human comprehension.

Earlier work in step-wise explanations tried to handle this by defining a linear objective function over explanation features. That objective might assign weights to the number of facts, the type of constraints used, whether constraints are adjacent to the explained fact, and so on. Then the system can search for the “optimal” explanation step under that objective.

The difficulty, obviously, is that someone has to define the weights. And “someone” usually means a modeller, researcher, domain expert, or product manager pretending to know how users reason. A noble tradition. Not a reliable one.

MACHOP keeps the linear objective but learns its weights through pairwise comparisons. Instead of asking a user to write a scoring formula, the system asks: which of these two explanations do you prefer? Over many comparisons, the system estimates the user’s preference weights.

That makes the method an adaptation of Constructive Preference Elicitation, or CPE, to step-wise explanations. The word “constructive” matters: the system is not merely ranking a fixed catalogue of explanations. It generates candidate explanation steps under constraints, asks for feedback, updates the weights, and repeats.

The mechanism is therefore a loop:

Select a puzzle or constraint-state instance.
Generate two candidate explanation steps.
Ask the user which one is preferable, or whether there is no preference.
Update the learned weights.
Use the new weights to generate better future explanations.

That loop sounds straightforward. It is not, because explanation steps are awkward objects. They involve multiple sub-objectives, different feature scales, many possible facts to explain, and candidate pairs that can easily become too similar to teach the system anything.

MACHOP is the paper’s answer to that awkwardness.

MACHOP fixes the comparison question before it celebrates the answer

Preference learning depends heavily on the quality of the questions. If the system shows users two almost identical explanations, the answer teaches little. If it shows one explanation that is clearly worse on every relevant feature, the answer is trivial. If it only explores whichever feature looked important early on, the learner can overfit to its own first impressions. A familiar executive failure mode, now with equations.

The baseline method, Choice Perceptron, generates one explanation by optimising the current learned objective, then generates a second explanation by balancing quality with diversification. That second explanation is supposed to be different enough to produce useful feedback.

The authors observed a problem: after a few iterations, the second explanation often becomes merely the second-best explanation under the current weights, and may even be dominated by the first. In other words, the pair can become a rigged contest. If one option is no better on any objective, the user’s choice is not very informative.

MACHOP adds a non-domination constraint. The second explanation must improve on at least one objective relative to the first. This does not guarantee that the pair is cognitively perfect, but it prevents the most useless category of comparison: “Would you prefer the better thing, or the worse thing?”

Then MACHOP changes the diversification strategy. Instead of treating all feature differences equally, it uses a UCB-inspired weighting scheme. In multi-armed bandits, Upper Confidence Bound methods balance exploitation of high-reward arms with exploration of less-tested arms. MACHOP imports the intuition, not the casino glamour. Each sub-objective becomes something like an arm: worth exploring if users often prefer improvements on it, or if the system has not tested it enough.

That is the mechanism-first lesson. MACHOP does not merely learn preferences; it learns by asking less lazy questions.

Normalisation is not housekeeping; it is the stability layer

The paper’s second important mechanism is feature normalisation.

Explanation features can live on very different scales. The number of facts, adjacent row constraints, other block constraints, clue constraints in logic-grid puzzles, and distance-based feature groups are not naturally comparable. If one feature has larger raw values, it can dominate the weight update even if the user does not actually care much about it.

The authors test several approaches:

Normalisation strategy	Likely purpose in the paper	What the result says
Default approximate nadir-point normalisation	Baseline scaling approach	Performs poorly in this setting, especially for logic-grid puzzles with broader feature ranges
No normalisation	Ablation / stress test	Leaves scale effects unmanaged and harms the non-domination variant
Cumulative normalisation	Proposed dynamic method	Performs among the best by scaling against values seen during training
Local normalisation	Proposed dynamic method	Performs best on average and is adopted for later experiments

This is easy to underplay, because normalisation sounds like data plumbing. It is not. In preference learning, the weight update is the memory of the system. If raw feature scales distort that update, the system learns a user who does not exist.

Local normalisation is especially practical because it uses the current comparison pair. It asks, in effect: for this user decision, what feature scale is actually visible? That keeps the learning signal tied to the comparison being made, rather than to a global bound that may be expensive, overestimated, or irrelevant.

For business systems, this is the part that should sound familiar. Operational explanations often mix heterogeneous ingredients: time, distance, policy priority, cost, risk, capacity, seniority, and contractual exclusions. If the explanation preference model cannot handle scale, it will quietly confuse “large number” with “important reason.” This is how dashboards become theatre.

The evidence is a sequence of mechanism tests, not one heroic benchmark

The experimental section is structured around five research questions. The paper is not simply saying “MACHOP won.” It tests whether each mechanism contributes to better preference learning.

Test	Likely role	Main finding	What it does not prove
Q1: Non-domination	Ablation / mechanism validation	Adding the disjunctive non-domination constraint significantly lowers regret in 7 out of 8 setups	It does not prove the generated pairs are always cognitively ideal
Q2: Normalisation	Robustness / sensitivity test	Default nadir normalisation performs poorly; cumulative and local normalisation work best, with local best on average	It does not remove the need for well-designed features
Q3: Weighting schemes	Comparison among query-generation variants	UCB weighting reduces regret by about 40% versus alternatives and supports MACHOP’s exploration logic	It does not show that UCB is universally optimal outside these feature spaces
Q4: Offline fact selection	Runtime-quality trade-off test	Preselecting facts can reduce query generation time, especially using SES ordering, with some regret cost	It does not solve all latency issues for larger industrial CSPs
Q5: Real-user study	Human validation	MACHOP aligns better with user preferences than Choice Perceptron after 30–50 queries	It is still Sudoku-only and involved 30 participants

The headline numbers are strong. On simulated users, MACHOP reduces relative regret sharply compared with Choice Perceptron:

Method	Sudoku relative regret	Logic-grid relative regret
Choice Perceptron	$2.0 \pm 2.9$	$3.8 \pm 7.7$
MACHOP	$0.4 \pm 0.4$	$0.9 \pm 1.0$

The authors describe this as roughly an 80% reduction in regret for both Sudoku and logic-grid puzzles. That is the main evidence that the combined mechanism matters: non-domination, local normalisation, and UCB-guided diversification work better together than the baseline elicitation method.

But the variance is also informative. Logic-grid puzzles show wide ranges under some methods, especially in the appendix summary. This suggests the problem setting can produce hard cases, not merely tidy demonstrations. That matters because enterprise constraint systems will almost certainly look more like the hard cases than like a classroom Sudoku grid with excellent manners.

The user study shows alignment, not magic comprehension

The real-user study is the paper’s strongest practical signal and also its clearest boundary.

The authors ran an interactive Sudoku study with 30 participants. Users provided preferences for both Choice Perceptron and MACHOP, with evaluations after 10, 30, and 50 queries. For evaluation, the authors compared explanations generated by the learned objective against SES explanations on a new Sudoku puzzle sequence of 56 steps. Each user labelled around 400 pairs in total, taking 45 to 90 minutes.

The result is not that MACHOP instantly understands the user. After 10 queries, SES still wins more often. That is useful. It tells us preference learning has a warm-up cost. The system needs enough comparisons before the learned weights become meaningful.

After 30 queries, the picture changes:

Query count	MACHOP preferred over SES	Choice Perceptron preferred over SES
10	$25.2% \pm 16.6$	$16.2% \pm 14.8$
30	$70.7% \pm 18.5$	$44.8% \pm 21.2$
50	$72.6% \pm 14.9$	$38.8% \pm 25.7$

The interpretation is precise: MACHOP learns explanation preferences that users choose over the smallest-explanation baseline more often than Choice Perceptron does. At 30 and 50 queries, no user selected Choice Perceptron explanations more frequently than MACHOP explanations. The paper reports one-sided Wilcoxon signed-rank significance at $p < 10^{-3}$ after 10 queries and $p < 10^{-6}$ after 30 and 50 queries, with positive Cliff’s delta values.

That is meaningful evidence for the mechanism. It is not evidence that MACHOP creates universally good explanations. The users were evaluating Sudoku explanations, not audit trails for credit models or production schedules. The system learned preferences over engineered features, not free-form human explanation taste in all its glorious inconsistency.

The runtime results are also nuanced. In the real-user study, MACHOP query generation took roughly 1.4 seconds after 10 queries, 2.6 seconds after 30, and 3.0 seconds after 50. Choice Perceptron was faster, between 0.9 and 1.4 seconds, but aligned less well with users. For interactive systems, that trade-off is acceptable in many settings. Three seconds is not instant; it is also not “please go make coffee while the optimiser contemplates existence.”

Offline fact selection is the product manager’s section

One of the paper’s more practical moves is to separate explanation quality from waiting time.

In the fully online version, the solver can choose which fact to explain next by searching across all unexplained facts. This is more flexible, but expensive. The authors test offline alternatives where the fact sequence is precomputed: either randomly or by using the SES sequence.

The results show the trade-off clearly:

Fact selection	Sudoku time	Sudoku regret	Logic-grid time	Logic-grid regret
Online	$35.6 \pm 38.3$s	$0.4 \pm 0.4$	$49.2 \pm 15.1$s	$0.9 \pm 1.0$
Offline — Random	$44.2 \pm 44.7$s	$0.7 \pm 0.8$	$11.2 \pm 1.9$s	$2.7 \pm 9.7$
Offline — SES	$12.5 \pm 26.7$s	$0.6 \pm 0.4$	$8.3 \pm 1.0$s	$1.4 \pm 3.2$

For logic-grid puzzles, both offline methods reduce query generation time substantially, though regret worsens. For Sudoku, random offline selection can actually be slower because some random facts require complex explanations. SES ordering is the more sensible compromise: faster than online selection and not too damaging to regret.

This is exactly the kind of trade-off that matters in product design. The mathematically best explanation learner may be unusable if it makes humans wait too long between comparisons. MACHOP with SES-based offline selection is not the top-quality variant, but the paper notes that it ranks second for logic-grid puzzles and third for Sudoku when compared with the broader results. That is a respectable speed-quality compromise.

Business systems will face the same choice. A compliance analyst may tolerate a slower explanation for a high-risk decision. A dispatch operator will not. A training interface may collect 50 comparisons across a session. A live workflow assistant probably gets fewer chances before the user closes the panel and develops an opinion about the vendor.

What the paper directly shows

The direct claims should stay inside the evidence.

First, the paper shows that Constructive Preference Elicitation can be adapted to step-wise explanations in CSP-style domains. The adaptation is not a trivial copy-paste because explanation steps require choosing both what fact to explain and how to explain it.

Second, it shows that query generation quality matters. Preventing dominated comparisons and using UCB-guided diversification improves learning efficiency in the tested settings.

Third, it shows that feature scaling is a serious issue. Dynamic normalisation, especially local normalisation, stabilises learning better than the default approximate nadir-point approach in these experiments.

Fourth, it shows that MACHOP performs better than Choice Perceptron on simulated users across Sudoku and logic-grid puzzles, reducing relative regret by about 80% in the headline table.

Fifth, it shows that real users in a Sudoku study preferred MACHOP-learned explanations over SES explanations much more often after 30 and 50 queries than explanations learned by Choice Perceptron.

That is already enough. No need to dress it up as general artificial empathy. The paper is not teaching AI to understand humans. It is teaching a solver to stop assuming that one hand-written objective captures everybody’s idea of a good explanation. A smaller claim, and therefore a more useful one.

What Cognaptus infers for business use

The business pathway is not “Sudoku today, autonomous enterprise tomorrow.” That would be the usual staircase made of fog.

The more defensible pathway is this:

Many business systems already use constraint-like reasoning: scheduling, rostering, routing, resource allocation, configuration, policy compliance, and workflow triage.
These systems often have multiple valid explanations for the same decision.
Different users prefer different explanation styles: concise, familiar, constraint-heavy, example-heavy, policy-grounded, exception-focused, or operationally actionable.
Manually defining the perfect explanation objective for every user role is brittle.
Pairwise preference elicitation offers a way to learn explanation preferences from interaction, provided the system asks informative comparison questions and keeps latency tolerable.

In that pathway, MACHOP is not a finished enterprise product. It is a design pattern for adaptive explanation interfaces.

Consider a rostering platform. A planner may prefer explanations based on staffing coverage and leave constraints. A union representative may care more about fairness, consecutive shifts, and contract clauses. A manager may want the shortest operational rationale. A new user may prefer familiar constraints over technically minimal ones. The same assignment can be explained through several valid paths. MACHOP-style elicitation suggests a way to learn which path is useful for each audience.

The same logic applies to configuration systems. If a product bundle is rejected, one explanation may cite a dependency rule; another may cite a safety constraint; another may show the minimal conflicting subset. The best explanation depends on whether the user is a salesperson, engineer, auditor, or customer support agent.

This is where personalised explainability becomes operational rather than ornamental. The value is not simply higher “trust.” Trust is too vague, too easily abused, and too often used as a decorative noun. The value is lower diagnosis cost: fewer escalations, faster acceptance of correct decisions, better training, and more consistent review of automated recommendations.

What remains uncertain before this becomes enterprise machinery

The paper is careful enough that its limitations are not hard to find.

The first boundary is domain. The experiments use Sudoku and logic-grid puzzles. These are standard and useful benchmarks for explainable constraint programming, but they are still controlled environments. Industrial scheduling and compliance systems have messier constraints, noisier users, more political objectives, and far less patience.

The second boundary is representation. The method assumes explanation quality can be represented through engineered features and a linear weighted objective. That is computationally convenient and solver-friendly. It may miss non-linear or contextual preferences: for example, a user may like short explanations except when a high-risk rule is involved, or may prefer familiar constraints only during training.

The third boundary is preference stability. MACHOP learns one weight vector across instances, assuming the user’s preferences are stationary enough to generalise. That may be reasonable for puzzle explanations. In organisations, preferences may shift by task, risk level, role, time pressure, or even by whether the user has already been embarrassed in a meeting that day.

The fourth boundary is interaction cost. The user study required substantial labelling: around 400 pair labels per participant including training and evaluation. The paper’s evaluation design is not the same as a lightweight enterprise onboarding flow. A production system would need careful UX design: fewer comparisons, opportunistic feedback, role-level priors, reuse across similar users, and graceful handling of “no preference” signals.

The fifth boundary is language. MACHOP selects structured explanation steps. It does not solve the full problem of writing natural-language explanations that are clear, compliant, and context-sensitive. A business deployment would likely combine structured explanation selection with natural-language rendering, templates, or language models. That extra layer introduces its own failure modes, because apparently one layer of explainability trouble was not enough.

The practical takeaway: explanation systems need preference memory

The important shift in this paper is from static explanation design to preference memory.

A static explanation system asks designers to decide in advance what “understandable” means. A preference-aware explanation system treats understandability as something learned through interaction. MACHOP makes that shift within the disciplined world of constraint explanations.

The paper’s mechanism-first lesson is therefore broader than the benchmark:

Do not assume minimal explanations are clearest.
Do not ask users trivial comparison questions.
Do not let feature scale masquerade as preference.
Do not ignore latency when the human is inside the learning loop.
Do not call an explanation “personalised” merely because the font changed.

For Cognaptus-style automation, the implication is clean. As AI workflows move deeper into operational decisions, the explanation layer cannot remain a generic afterthought. Users will not only ask what the system decided. They will ask why this explanation was chosen for this person in this context.

MACHOP does not answer that question completely. It does something more useful: it shows how to start making the explanation selector itself adaptive, testable, and measurable.

The puzzles are small. The product lesson is not.

Cognaptus: Automate the Present, Incubate the Future.

Marco Foschini, Marianne Defresne, Emilio Gamba, Bart Bogaerts, and Tias Guns, “Preference Elicitation for Step-Wise Explanations in Logic Puzzles,” arXiv:2511.10436, 2025, https://arxiv.org/abs/2511.10436. ↩︎

The shortest explanation is not automatically the best explanation#

MACHOP fixes the comparison question before it celebrates the answer#

Normalisation is not housekeeping; it is the stability layer#

The evidence is a sequence of mechanism tests, not one heroic benchmark#

The user study shows alignment, not magic comprehension#

Offline fact selection is the product manager’s section#

What the paper directly shows#

What Cognaptus infers for business use#

What remains uncertain before this becomes enterprise machinery#

The practical takeaway: explanation systems need preference memory#