Watch This Space: How Two Simple Heuristics Outsmarted a Whole SAT Solver

A solver can lose time in a very boring place: deciding which internal bookkeeping trick to use.

That sounds too small to matter. Business people usually expect optimization performance to come from grand architecture, better mathematical modeling, expensive hardware, or some heroic AI layer sprinkled on top. Researchers know better, though not always loudly enough. Sometimes the expensive part is not the model. It is the tiny repeated decision made millions of times while the solver tries to keep the model logically alive.

That is the useful lesson in Mia Müssig and Jan Johannsen’s short paper on new hybrid heuristics for pseudo-Boolean propagation.¹ The paper does not introduce a new solver. It does not redesign pseudo-Boolean reasoning. It does not announce that “AI has solved optimization,” which is refreshing, because apparently some people still need that sentence printed on a mug.

Instead, it changes a routing rule inside RoundingSAT: when a pseudo-Boolean constraint is added, should the solver propagate it using the counting method or the watched-literal scheme?

The answer proposed by the paper is almost suspiciously simple:

$$ a_1 > c $$

$$ a_1 > c + a_2 $$

where $a_1$ and $a_2$ are the largest two coefficients in the constraint, and $c$ is a threshold such as 500 or 1000.

That is the whole trick. Not the whole science, but the whole intervention. And on the paper’s selected large-coefficient benchmark sets, that intervention changes runtime behavior enough to deserve attention.

The business lesson is not “always use this heuristic.” The paper is much narrower than that. The real lesson is: if an optimization engine repeatedly chooses among execution strategies, a cheap feature-aware dispatch rule can matter as much as a fancier algorithm. In operational systems, the routing layer is often where performance quietly goes to die.

Pseudo-Boolean constraints make propagation less SAT-like than people expect

A normal SAT solver works with Boolean clauses. A pseudo-Boolean solver works with linear inequalities over Boolean variables. A constraint can look like this:

$$ \sum_{i=1}^{n} a_i l_i \geq b $$

where each $l_i$ is a Boolean literal and each $a_i$ is a positive integer coefficient. This is already more expressive than ordinary SAT clauses. It can represent weighted requirements, knapsack-like conditions, resource constraints, and many allocation problems more directly.

That expressiveness has a price. In a SAT clause, watched literals are famously effective: the solver watches a small number of literals and avoids revisiting most clauses after most assignments. It is one of those ideas that looks obvious only after someone else has already done the hard thinking.

For pseudo-Boolean constraints, the same intuition does not transfer cleanly. A constraint is not merely satisfied by one true literal. The weights matter. The remaining slack matters. The solver needs to know whether a partial assignment has made the constraint impossible or has forced some literal to become true.

The paper defines slack as the remaining margin under a partial assignment. Informally, slack tells the solver how much room is left before the constraint becomes impossible. If the slack drops below zero, the constraint is violated. If the slack drops below a coefficient $a_i$, then the corresponding literal must be set to true, because setting it false would make the constraint impossible.

This is where propagation cost appears.

The counting method updates the slack of every affected constraint whenever a variable is assigned or unassigned. It is direct, but potentially busy. The watched-literal scheme tries to avoid touching constraints unless the assigned variable is being watched. That sounds like a clean win if one imports SAT intuition too aggressively.

The paper’s first useful correction is that pseudo-Boolean propagation is not SAT propagation with heavier notation. Watched literals help, but they are not automatically dominant. For pseudo-Boolean constraints, the weight structure can make counting competitive, or even preferable.

The old hybrid rule asked how many literals needed watching

RoundingSAT already uses a hybrid propagation strategy. It does not blindly choose counting or watched literals globally. Instead, for each new constraint, it decides which method should handle that constraint.

The existing hybrid rule is based on the size of the watched set. The solver estimates how many literals must be watched to satisfy the watched-literal condition. If the watched set would contain a large share of the literals, the advantage of watching becomes weaker, so the solver uses counting instead.

With the default parameter $p=0.7$, the paper explains that RoundingSAT uses counting when the watched set would need at least roughly 30% of the literals in the constraint.

That rule has a natural appeal. Watched literals are supposed to be efficient because they avoid looking at most things. If “watching” requires watching a large fraction of the constraint, it is not really much of a shortcut. So the old hybrid asks a size question:

How many literals do I need to watch before the watched-literal scheme stops being worthwhile?

The new paper asks a different question:

Is the coefficient structure itself warning me that counting will be safer?

That shift is small but important. The old rule measures how large the watched set must be. The new rules inspect the largest coefficients directly.

The new heuristics treat large coefficients as routing signals

The paper introduces two coefficient-sensitive hybrid heuristics.

The first is the absolute heuristic:

$$ \text{useCounting} \leftarrow (a_1 > c) $$

The second is the additive heuristic:

$$ \text{useCounting} \leftarrow (a_1 > c + a_2) $$

Both are constraint-level dispatch rules. They do not change the pseudo-Boolean model. They do not change the objective. They do not change the logic of what solution is valid. They decide which propagation engine should be used for a specific constraint.

The intuition is easy to state. If the largest coefficient is sufficiently large, or sufficiently large relative to the second-largest coefficient, then the constraint may behave in a way where simple watched-set size is not the right signal. A large coefficient can dominate the slack dynamics. In that case, counting may provide better propagation behavior than watching.

The additive heuristic is slightly more relational. It does not merely ask whether the largest coefficient is big. It asks whether it is big enough to stand out beyond the second-largest coefficient by a threshold. That makes it a crude but useful shape detector: is this constraint coefficient-balanced, or does one coefficient tower over the others?

This is not machine learning. It is not adaptive tuning. It is not a learned classifier pretending to be a theorem. It is a few lines of solver logic based on the coefficient pattern.

And that is precisely why the result is interesting. In production systems, interventions with low engineering complexity and measurable runtime impact are unusually valuable. They are not glamorous. They also ship.

The main benchmark shows faster runtime, not a miracle jump in solved count

The paper’s main experiment uses instances from the OPT-LIN and DEC-LIN tracks of the Pseudo-Boolean Competition 2025. The authors start with 1,157 instances, then filter out instances where all coefficients in all input constraints are below 100. That leaves 206 large-coefficient instances. The timeout is 3,600 seconds, the machine is an AMD Ryzen 7 7735HS with 16 GB of memory, and SoPlex is enabled.

That filtering choice matters. The new heuristics only perform well for $c \geq 100$, according to the paper. On small-coefficient instances, the new rules would often never trigger counting, making behavior effectively identical to RoundingSAT without hybrid mode. So the benchmark is not “all pseudo-Boolean problems.” It is “the part of recent competition data where large coefficients exist and the new signal can actually fire.”

This is the right place to interpret the cactus plot carefully.

In Figure 1, the number of solved instances does not explode. Watched literals solve 91 instances, counting solves 92, the default hybrid solves 96, default hybrid with cutoff 0.8 solves 93, and the new variants solve between 95 and 97 depending on threshold and rule. The best variants reach 97 solved instances.

So the claim is not that the new heuristic suddenly solves twice as many problems. The stronger claim is about runtime shape. The curves for the new heuristics, especially the additive heuristic with $c=500$ and strong absolute variants, sit noticeably below the existing hybrid method over much of the harder solved range. In practical terms, this means the solver reaches comparable or slightly better solved-instance coverage faster.

That distinction matters for business interpretation. A solver improvement that saves time on already-solvable cases is still valuable if the solver is embedded in a repeated operational workflow. Scheduling engines, pricing configurators, resource allocation systems, and planning tools often run many related optimization jobs, not one heroic benchmark instance per quarter. Runtime saved repeatedly is runtime that becomes capacity, responsiveness, or lower infrastructure cost.

But this is not a universal dominance claim. It is an observed performance improvement on a filtered large-coefficient benchmark using one solver implementation.

A compact reading of the evidence looks like this:

Paper element	Likely purpose	What it supports	What it does not prove
Figure 1 on 206 large-coefficient 2025 competition instances	Main evidence	Coefficient-aware hybrid rules can outperform RoundingSAT’s existing hybrid rule on large-coefficient instances	Universal superiority across all pseudo-Boolean workloads
Default hybrid with cutoff 0.8	Comparison / sensitivity check	Simply raising the old watched-set cutoff does not reproduce the improvement	That 0.7 is globally optimal
Figure 2 on 2024 knapsack instances	Robustness and application-style comparison	The additive rule also improves runtime on a widely used knapsack benchmark family	That the heuristic is best for all structured optimization models
Appendix Figure 3 with more $c$ values	Sensitivity test	Threshold choice matters; several values still beat the old hybrid, but 500 and 1000 look stronger	That one threshold should be hard-coded forever
Extra tested forms such as $a_1 > c \cdot a_2$	Exploratory extension	Not every coefficient-based idea works	That coefficient-aware routing has been fully explored

The appendix is especially useful because it prevents the article from becoming a fairy tale. The authors test larger and smaller threshold values. Those variants still outperform the current hybrid mode in the shown large-coefficient setting, but they perform worse than the better $c=500$ and $c=1000$ variants. They also report other heuristic forms that did not outperform the current hybrid method.

That is what a good small systems paper should do: show the win, then show that the win is not magic.

The knapsack result says the old hybrid can be actively unhelpful

The second benchmark uses 783 knapsack instances from the Pseudo-Boolean Competition 2024, a dataset often used in pseudo-Boolean solving research.

This is not just a repeat of Figure 1 with different labels. It changes the interpretation.

On the knapsack dataset, the paper reports that the current hybrid method performs noticeably worse than the watched-literal scheme. That is an important warning. A hybrid strategy is not automatically safer just because it combines two methods. A bad dispatch rule can combine methods in the wrong places and underperform one of its own components.

The additive criterion with $c=500$ again shows a large improvement in average runtime. The solved counts in the plotted tail are close: watched literals solve 737, counting solves 730, default hybrid solves 735, and the new plotted variants solve 736. Again, the value is mainly runtime behavior, not a dramatic solved-count jump.

For business readers, the lesson is broader than pseudo-Boolean propagation. Hybrid systems often look robust on architecture diagrams because they include multiple methods. But including multiple methods is not the same as choosing correctly among them. A hybrid optimizer, hybrid recommender, hybrid fraud detector, or hybrid agent pipeline can underperform a simpler baseline if its routing logic is weak.

“Hybrid” is not a performance guarantee. It is an obligation to make the gatekeeper intelligent enough.

The business value is dispatch discipline, not solver glamour

The practical pathway from this paper to business use is not that every company should immediately patch RoundingSAT. Most companies do not directly manage pseudo-Boolean solver internals. Many do not even know when their vendor stack uses SAT, MILP, CP-SAT, pseudo-Boolean solving, or some ungodly mixture wearing a friendly dashboard.

The useful translation is at the architecture level.

Many business systems contain repeated routing decisions:

Technical setting	Routing question	Business consequence
Constraint solving	Which propagation method should handle this constraint?	Lower runtime for planning, allocation, and scheduling jobs
Optimization pipelines	Which solver or formulation should handle this instance?	Fewer timeouts and more predictable batch processing
AI automation systems	Which tool, model, or workflow should handle this task?	Lower API cost and fewer slow escalations
Data processing	Which computation path should process this record or query?	Better latency without rewriting the whole system

The paper is a small example of a large engineering principle: dispatch rules deserve as much attention as the methods being dispatched.

In AI-heavy business automation, this point is easy to forget. Teams often spend weeks debating whether to use Model A or Model B, then route every task through the same model with a prompt that looks like it survived a committee meeting. The pseudo-Boolean result is a useful antidote. Sometimes the correct question is not “which engine is best?” but “which cheap observable feature tells us when each engine should be used?”

Here, the observable feature is coefficient structure. In another system, it might be document length, data sparsity, request ambiguity, customer tier, tool-call failure rate, or historical latency. The principle is the same: the routing signal must match the failure mode.

What the paper directly shows, what Cognaptus infers, and what remains uncertain

The paper directly shows that two simple coefficient-sensitive hybrid heuristics improve RoundingSAT runtime on selected large-coefficient pseudo-Boolean benchmarks. It also shows that the old hybrid cutoff is not automatically optimal, that the default hybrid can be poor on knapsack instances, and that threshold selection matters.

Cognaptus infers a broader operational lesson: in business optimization systems, lightweight instance-aware dispatch can produce meaningful performance gains without changing the user-facing model. That inference is reasonable because many production solvers and automation systems repeatedly choose among internal execution paths. But it is still an inference. The paper does not test enterprise scheduling data, supply-chain planning jobs, cloud deployment costs, or SLA improvements.

The uncertain part is generality.

First, the results are from RoundingSAT. Another pseudo-Boolean solver may have different propagation implementation details, memory behavior, conflict analysis interactions, or learned-constraint distributions.

Second, the main competition benchmark is filtered for large coefficients. The authors explicitly note that small instances should likely keep the existing hybrid behavior, because the new rules do not help there in the same way. Their conclusion suggests automatically classifying small instances and using the current hybrid heuristic for them while applying the new heuristics to the remaining instances.

Third, the best threshold is empirical. Values such as 500 and 1000 perform well in the paper’s tested setting, but the appendix makes clear that threshold choice affects performance. A production deployment would need either validation on local workloads or a conservative fallback strategy.

Fourth, runtime improvements in cactus plots are operationally meaningful, but they are not the same as business ROI. ROI depends on how often the solver runs, how much runtime costs, whether latency blocks decisions, and whether faster solving changes actual operational quality. A warehouse scheduling team running thousands of daily optimization jobs should care differently from a team running one offline planning job per month.

The boundary is not a weakness. It is what keeps the result usable.

The most interesting part of the paper is not that $a_1 > c + a_2$ works well on the tested benchmarks. It is that such a simple rule could beat a more established hybrid decision method at all.

That suggests the old routing rule was looking at a reasonable but incomplete proxy. Watched-set size captures one dimension of cost. Coefficient structure captures another. For large-coefficient constraints, the second signal can be more informative.

This is a recurring pattern in technical systems. A mature default often encodes a sensible historical compromise. Then a new workload distribution appears, or an old one becomes more important, and the default starts leaking performance. The fix is not always a revolution. Sometimes it is a small diagnostic feature that tells the system, “not this path, not for this instance.”

For pseudo-Boolean solving, that feature is the largest coefficient, alone or relative to the second largest.

For business automation, the equivalent question is uncomfortable but productive: which internal default is still being treated as wisdom when it is merely an old average?

The paper does not answer that for every system. It gives a precise example in one solver. That is enough. Good engineering papers do not need to shout. They just need to reveal where the expensive assumption was hiding.

Cognaptus: Automate the Present, Incubate the Future.

Mia Müssig and Jan Johannsen, “New Hybrid Heuristics for Pseudo-Boolean Propagation,” arXiv:2511.21417, 2025. https://arxiv.org/pdf/2511.21417 ↩︎

Pseudo-Boolean constraints make propagation less SAT-like than people expect#

The old hybrid rule asked how many literals needed watching#

The new heuristics treat large coefficients as routing signals#

The main benchmark shows faster runtime, not a miracle jump in solved count#

The knapsack result says the old hybrid can be actively unhelpful#

The business value is dispatch discipline, not solver glamour#

What the paper directly shows, what Cognaptus infers, and what remains uncertain#

The quiet punchline: small heuristics can expose large blind spots#