Double Lift-Off: Learning to Reason Without Ever Building the Model

Data is usually incomplete. That is not a philosophical statement; it is Tuesday.

A clinical study may record which treatment a patient received but miss one biomarker. A compliance system may know that two entities are connected but not know the contract terms. An environmental monitoring project may have sensor readings for some locations, at some times, under some weather conditions, and then a heroic spreadsheet pretending this is a dataset.

The usual enterprise instinct is to build a model anyway. Estimate the missing values. Fit a probabilistic graph. Train a neural system. Hope the inference engine does not collapse under the weight of all the entities, relations, exceptions, and sampling noise. This is how many “AI reasoning” projects quietly become data-cleaning projects wearing a lab coat.

The paper “Lifted Relational Probabilistic Inference via Implicit Learning” by Luise Ge, Brendan Juba, Kris Nilsson, and Alison Shao takes a different route.¹ It asks whether a system can reason in a first-order probabilistic relational language using incomplete axioms and partial observations without ever constructing an explicit probabilistic model.

That phrase is easy to misunderstand. This is not another neural-symbolic architecture. It is not an LLM prompted to “think step by step,” then rewarded for sounding composed. The paper is much more formal, and therefore less fashionable in the usual demo-driven sense. It combines implicit learning with lifted relational sum-of-squares inference. The output is not a fluent explanation. It is a certificate-style reasoning process: either the constraints remain feasible, or a bounded-degree algebraic refutation shows that a queried hypothesis is inconsistent with what has been learned.

That may sound abstract. It is. But the mechanism is worth understanding because it attacks a real bottleneck in business AI: many operational decisions involve relational structure, partial observations, and background rules, while the full joint model is either unavailable, too expensive to learn, or structurally wrong by the time it is learned.

The paper’s contribution is a double lift. One lift avoids grounding every possible individual. The other avoids separately reasoning over every possible partial world. The result is a theoretical polynomial-time framework under fixed proof degree and quantifier rank. No confetti cannon, no benchmark leaderboard, no “agentic workflow” diagram with sixteen arrows. Just a serious attempt to make learning-to-reason tractable in a domain where tractability normally leaves the room early.

The expensive mistake is trying to learn the whole model first

In probabilistic relational domains, we care about uncertain facts involving entities and relations. Examples include:

Domain	Entities	Relations	Query type
Biomedical trial screening	patients, drugs, tumors	received treatment, biomarker response, adverse event	Should this treatment advance?
Compliance monitoring	firms, contracts, jurisdictions	owns, controls, transfers, violates	Is this entity-risk pattern inconsistent with policy?
Environmental monitoring	sensors, sites, pollutants	located near, exceeds threshold, influenced by source	Is a regional risk condition supported?
Supply-chain risk	suppliers, facilities, shipments	depends on, substitutes for, delays	Does a failure pattern imply exposure?

In each case, the data is relational: one fact refers to another through shared entities. It is also probabilistic: uncertainty is not noise around one variable but uncertainty over whole relational structures.

Classical probabilistic relational models, such as Markov Logic Networks or probabilistic relational models, try to define a distribution over these structures. Lifted inference then exploits symmetry so that repeated relational patterns do not need to be separately enumerated. The problem is that lifted inference usually assumes the model already exists. Learning the model from partial observations is a different headache, and not a small one.

The paper’s core move is to stop treating learning and inference as two separate phases. Instead of first learning a complete probabilistic model and then querying it, the system learns only the constraints needed for inference and embeds them directly into a sum-of-squares reasoning program.

That is the “implicit learning” part. The system does not need to say, “Here is the full generative model of the world.” It only needs enough learned and explicit constraints to refute or fail to refute the query under consideration.

For business readers, this distinction matters. Many decision systems do not need a cinematic world simulator. They need to know whether a proposed conclusion is compatible with known rules, partial observations, and uncertainty bounds. Building the entire model first can be expensive, brittle, and unnecessary. A charming combination.

The paper reasons by refutation, not prediction

The paper’s inference style is closer to formal verification than to ordinary machine learning prediction.

The input is a knowledge base in first-order probabilistic relational logic. It includes two broad kinds of constraints.

First, there are logical constraints. These are restrictions that should hold almost surely. For example, a relation may be required to stay within a valid range, or a rule may state that one relational condition implies another.

Second, there are expectation constraints. These describe bounds on expected values of relational expressions. Instead of saying that a property always holds, they say that a probabilistic quantity must fall within some range.

The system tests a query by adding the query’s negation to the knowledge base and checking whether the resulting constraint system can be refuted. If the extended system is inconsistent, the original query is supported. If no bounded-degree refutation is found, the system has not proved the query.

This distinction is important. The method is not “classify the case as positive.” It is “under the allowed proof system, does the negated claim contradict the explicit and implicitly learned constraints?” That gives the method a different operational meaning from typical AI scoring.

A practical decision system based on this kind of reasoning would not merely output a confidence number. It would be organized around constraints, assumptions, and certificates. That is less glamorous than a chatbot, but much closer to the kind of audit trail that regulated workflows actually ask for once the prototype stops being cute.

Sum-of-squares turns probabilistic reasoning into an algebraic feasibility problem

The paper builds on sum-of-squares logic. The simplified idea is this: if every valid model must make certain polynomial expressions nonnegative, then a contradiction can be certified by showing that a sum of nonnegative terms equals a negative constant. Since squares are nonnegative, a sum-of-squares proof can serve as a formal refutation.

The paper uses a bounded-degree fragment of the sum-of-squares hierarchy. That bound matters. General relational probabilistic inference can be brutally hard. The tractability claim is not “all reasoning is easy now.” It is that, for fixed sum-of-squares degree and fixed quantifier rank, the relevant semidefinite program can be compiled in polynomial time.

The mechanism can be summarized like this:

Component	What it does	Why it matters
First-order relational language	Represents predicates over entities and quantified constraints	Keeps the structure relational instead of flattening everything into independent features
Moment variables	Represent expectations of monomials over relational terms	Allows probabilistic knowledge to enter algebraically
SOS refutation	Searches for a bounded-degree algebraic contradiction	Produces certificate-style reasoning rather than heuristic prediction
Explicit compactness	Requires bounded ranges for variables	Makes confidence intervals and optimization well-behaved
Lifted equality constraints	Identify renaming-equivalent ground monomials	Avoids treating symmetric individuals as separate variables

The paper’s proof machinery is not an implementation benchmark. It is the main evidence. The relevant question is not “Did this beat model X on dataset Y?” The paper does not claim that. The question is whether the authors define a reasoning framework whose soundness, completeness under testability assumptions, and polynomial-time compilation can be established.

That is a different kind of contribution. It is less immediately productizable, but more foundational.

Partial observations are handled as masked worlds, not as missing cells

A key part of the paper is its treatment of partial observations.

The authors introduce partial models as incomplete views of full relational worlds. A masking process hides some values while preserving consistency with the underlying full model. The system receives independently sampled partial models from this induced distribution.

This is better than treating missing values as a mere data-cleaning inconvenience. In relational settings, missingness changes what can be verified. A partial observation may witness some constraints but not others. Some relational facts are visible; some are hidden; some polynomial expressions can still be bounded using what is visible plus explicit compactness constraints.

The paper’s notion of witnessing captures this. A polynomial inequality is witnessed by a partial model when, after substituting observed values and using worst-case bounds for unobserved terms, the inequality still remains nonnegative. In plain English: even after giving the missing values their most adversarial permitted interpretation, the constraint still holds.

That is a strong idea for decision support. It avoids the casual sin of saying “we filled in the missing values, therefore the rule holds.” Instead, it asks whether the rule survives missingness.

Expectation constraints are handled differently. Since expectations must be learned from samples, the method uses empirical averages and confidence intervals, relying on bounded variables and Hoeffding-style concentration. The paper defines a naive norm to bound polynomial expressions and uses those bounds to control statistical error.

A useful mental model is:

$$ \text{partial observations} + \text{bounded variables} \rightarrow \text{empirical moment bounds with confidence intervals} $$

Then those bounds are inserted into the SOS program.

The important detail is that the paper distinguishes between two kinds of learned knowledge:

Learned object	How it becomes usable	Why the distinction matters
Logical/support constraints	Must be witnessed under partial observations	A single visible violation can matter, and missing values require worst-case checking
Expectation constraints	Estimated through empirical bounds and confidence intervals	Statistical validity requires consistent sampling of the same grounded expression

This is one of the places where the paper is more subtle than the phrase “learning to reason” suggests. It is not simply throwing partial data into a solver. It carefully separates what can be witnessed logically from what must be estimated probabilistically.

The lottery example explains why grounding must be fixed

The paper’s most intuitive technical point appears early: when estimating individual marginal probabilities, the grounding must be fixed across examples.

The authors use a lottery-style intuition. Suppose we want to estimate the probability that a particular ticket wins. If we are allowed to choose a different ticket in each draw after seeing the outcome, we can always pick the winning ticket and produce a ridiculous estimate. This is not learning; it is laundering hindsight through notation.

In first-order relational settings, something similar can happen if groundings are allowed to vary across partial examples. A query involving generic individuals may be symmetric under renaming, but empirical estimates still require a consistent grounding set. Otherwise, the system may accidentally estimate “whoever makes the constraint look good in this sample” rather than “this relational expression under a fixed interpretation.”

This is the first half of the double lift: grounding-lift.

The method identifies renaming-equivalent ground monomials and collapses them into one lifted moment variable. This preserves the relevant symmetry while avoiding a full propositional expansion over every named individual. But it does not let the grounding drift across examples. The same grounding is used across empirical expressions so that the estimates remain statistically meaningful.

This is not a minor bookkeeping choice. It is the difference between valid empirical learning and a solver with excellent imagination.

The second lift avoids enumerating every partial world

The second half is world-lift.

Partial observations introduce many possible completions. If every partial example can correspond to many full worlds, a naive approach could require separate reasoning over an enormous number of pseudo-models or completion patterns. That would destroy tractability.

The paper’s algorithm avoids this by enforcing constraints in a lifted SDP structure where partial examples contribute linear-size moment constraints rather than separate full semidefinite programs for every world. In the authors’ formulation, the reasoning process operates over lifted monomials and empirical bounds across samples, not over an explicit enumeration of all possible completions.

The two lifts solve different explosions:

Explosion risk	Naive failure mode	Paper’s response
Many individuals	Ground every relational term for every named entity	Collapse renaming-equivalent monomials through grounding-lift
Many partial worlds	Enumerate every completion or pseudo-model	Enforce lifted constraints across samples through world-lift
Learning plus inference	Learn a complete model, then run inference	Insert learned empirical bounds directly into SOS inference

This is why the accepted “mechanism-first” framing is the right way to read the paper. A normal summary would say the authors combine learning and reasoning. That is true, but it misses the operational trick. The real contribution is that the paper identifies where the combinatorial blow-ups arise and shows how to avoid them under fixed proof-degree and quantifier-rank assumptions.

The algorithm learns bounds, then asks the solver whether the system survives

The paper’s Algorithm 1 can be read in four steps.

First, choose the bounded-degree monomials formed over the relevant grounded terms. This defines the algebraic vocabulary of the reasoning problem.

Second, for each partial model and each relevant monomial, compute lower and upper bounds consistent with the observed partial values and the witnessed constraints. Directly observed variables get their observed values. Hidden variables are bounded through SOS reasoning and explicit compactness constraints.

Third, average those bounds across samples and widen them using confidence parameters. This produces empirical upper and lower moment bounds.

Fourth, run the SOS solver on the knowledge base with those learned moment bounds. If the resulting semidefinite system is infeasible in the appropriate refutation setup, the query’s negation is refuted.

The main conceptual flow looks like this:

Partial relational samples
        ↓
Worst-case bounds for hidden terms
        ↓
Empirical moment bounds with confidence intervals
        ↓
Lifted SOS program
        ↓
Feasibility or refutation certificate

The paper’s examples are illustrative, not empirical demonstrations. The biomedical trial example with candidate drugs and hidden tumor shrinkage labels helps explain how partial observations and query refutation work. It should not be read as a benchmark or case study. Its purpose is expository: to show how a partially observed relational world can still support a refutation of a hypothesis such as “every drug advances to Phase II.”

This distinction matters because the paper does not present experimental scalability results. Its evidence is theorem-based.

The theorem stack is the evidence

The paper’s main results are theoretical. They are not ablations, robustness tests, or comparisons with prior systems in the experimental sense. The structure is closer to a proof pipeline.

Paper component	Likely purpose	What it supports	What it does not prove
Transferability from open to closed universe	Main theoretical foundation from prior lifted SOS work	Infinite-name reasoning can be represented through finite grounding under fixed rank	Practical speed on large industrial data
Soundness and completeness of lifted SOS	Main inference guarantee from prior framework	Degree-bounded refutations are valid and complete within the system	That low-degree proofs always exist
Partial model and masking definitions	Problem formulation	Missing observations can be represented formally	That real-world missingness is always i.i.d. or benign
Witnessing and testability definitions	Learning formulation	Partial observations can contribute constraints only when verifiable	That most useful constraints will be testable in practice
Algorithm 1	Main constructive contribution	Learned bounds and explicit rules can be compiled into SOS inference	Production-level engineering readiness
Soundness theorem	Main correctness guarantee	If a satisfying model exists, the algorithm will not falsely refute it with high probability	That the method is powerful enough to prove every desired query
Completeness theorem	Main proof-power guarantee under assumptions	If testable implicit constraints yield a bounded-degree refutation, the algorithm can recover it with high probability	That the assumptions hold in arbitrary enterprise datasets
Polynomial-time theorem	Main tractability claim	Fixed degree and quantifier rank avoid exponential grounding and world enumeration	That SDP solving is cheap for rich relational domains

The soundness theorem is especially important for business use. It says, roughly, that if the input system is satisfiable, the algorithm returns feasibility with high probability. This is the safety side: the system should not invent contradictions merely because it learned from partial data.

The completeness theorem is the power side. If there exist testable logical and expectation constraints that, together with the explicit knowledge base, admit a bounded-degree SOS refutation, then the algorithm can also return a refutation with high probability.

The polynomial-time theorem then says that, for fixed SOS degree and quantifier-rank bound, the algorithm can be compiled into a single SDP of polynomial size in the bit-complexity of the input and proof. The paper explicitly emphasizes that the infinite domain and the number of partial examples do not enter as a catastrophic exponent in the way a direct propositional encoding would.

That is the “double lift-off.” One lift avoids exploding over individuals. The other avoids exploding over worlds.

This is model-free, not assumption-free

The paper’s language may tempt a reader into thinking: no explicit model, therefore no assumptions. No. We are not in a magical forest.

The framework avoids learning a complete probabilistic model, but it still relies on structural assumptions:

Variables must be bounded. Explicit compactness requires finite upper and lower bounds. This is necessary for the SOS machinery and statistical confidence intervals.
Samples are partial models drawn i.i.d. from a masking-induced distribution. This is a clean theoretical setup. Real enterprise data pipelines often produce correlated missingness, administrative artifacts, duplicated entities, and policy-driven observation bias. Delightful little monsters.
Useful constraints must be testable. A constraint that cannot be witnessed often enough from partial observations cannot safely enter the learned implicit knowledge base.
The proof degree and quantifier rank must be fixed. Polynomial time is obtained under these fixed bounds. If the required proof degree is high, tractability can become more theoretical than operational.
SOS solvers still matter. The paper notes that SOS solvers remain an active research area and does not claim empirical scalability.

These limitations do not weaken the paper’s contribution. They define it. The contribution is not “we solved all relational probabilistic reasoning.” It is “under explicit boundedness, testability, and fixed-degree proof assumptions, implicit learning and lifted SOS inference can be unified in a polynomial-time framework.”

That is a narrower claim. It is also a more useful one.

The business value is certified narrowing, not omniscient prediction

For Cognaptus readers, the business relevance is not that this paper can be deployed tomorrow as a generic reasoning engine. It cannot, at least not from the evidence presented.

The relevance is in the architecture of decision support.

Many organizations face questions like:

Does this partial evidence contradict our compliance policy?
Can we rule out a risk hypothesis without learning a full network model?
Do observed relational patterns make a proposed operational decision inconsistent?
Can we reason over missing-but-bounded data without pretending we observed everything?

The paper suggests a pathway for systems that answer such questions by combining:

Background rules
+ partial relational observations
+ bounded uncertainty
+ certificate-style inference

That is different from a predictive model that says, “Risk score: 0.82.” A certificate-style system can say, in effect, “Given these rules, these observations, and these confidence bounds, the negation of this claim is infeasible under the proof system.”

This matters most where the decision is constrained by policy, science, or auditability rather than pure accuracy optimization.

Business setting	What the paper directly supports	Cognaptus inference	Remaining uncertainty
Early-stage biomedical screening	Reasoning from partial relational observations and background constraints	Useful for ruling out hypotheses under incomplete measurements	Real trial data may violate clean i.i.d. masking assumptions
Environmental monitoring	Bounded variables and partial observations can feed probabilistic relational constraints	Useful for risk-aware inference when sensors are incomplete	Spatial and temporal correlations require additional modeling care
Compliance and entity-risk networks	First-order relational rules can encode structural constraints	Useful for certificate-style contradiction detection	Rich entity graphs may require higher-degree proofs or engineering approximations
Scientific databases	Missing values can be handled through witnessed constraints and empirical bounds	Useful for formal query support without full generative modeling	Data curation and bound selection become critical

The operational value is not “replace analysts.” That phrase should be retired and perhaps given a quiet burial. The value is certified narrowing: reducing the set of plausible conclusions while preserving a traceable link between rules, observations, and uncertainty.

The misconception: this is not an LLM reasoning paper

The paper includes a related-work discussion of large language models and reasoning models, but that is not its technical center. The authors position their method against systems that are large, costly, opaque, unreliable, and not naturally designed to output probability or expectation bounds.

This comparison is fair as a conceptual contrast, but readers should not force the paper into the LLM category. The proposed method does not train a language model. It does not use chain-of-thought prompting. It does not rely on embedding similarity. It does not ask a model to produce a plausible answer and then decorate it with “reasoning.”

Instead, it uses formal constraints, empirical confidence bounds, and semidefinite programming.

A practical product could eventually combine this kind of formal backend with an LLM interface. The LLM could help users formulate queries, translate policies into candidate constraints, or explain certificates. But the reasoning guarantee would come from the formal system, not from the language model’s verbal confidence. That separation is important. Otherwise we get the worst of both worlds: opaque neural output with symbolic-looking stationery.

Where this could fit in an AI system architecture

A realistic future architecture inspired by the paper would probably not look like a general-purpose chatbot. It would look more like a reasoning service behind a constrained workflow.

One possible architecture:

Domain schema
  ↓
Relational predicates and bounded variables
  ↓
Background knowledge base
  ↓
Partial observations from operational systems
  ↓
Witnessing and empirical moment-bound layer
  ↓
Lifted SOS inference engine
  ↓
Certificate, infeasibility result, or non-proof
  ↓
Human-facing explanation layer

In this architecture, the LLM, if present, belongs mainly in the explanation and interface layers. It can help translate the certificate into business language. It should not be trusted as the certificate.

The hard engineering work would sit elsewhere:

defining predicates and variable bounds;
deciding which constraints are meaningful and testable;
tracking masking assumptions;
compiling efficient SDP instances;
managing solver precision;
presenting non-refutation honestly.

That last point matters. A system may fail to refute a query not because the query is false, but because the needed proof is too high-degree, the constraints are not testable, the data is too incomplete, or the solver formulation is too weak. A non-proof is not a counter-proof. This is the kind of distinction that keeps formal systems honest and product teams mildly annoyed.

The boundary section: promising theory, no empirical scalability claim

The paper is explicit that it does not claim empirical scalability. This should be repeated once, clearly, and not scattered like seasoning across every paragraph.

The work is theoretical. Its main achievements are formulation, algorithm design, and proof guarantees. The biomedical example is illustrative. There are no benchmark tables showing runtime across graph sizes, no ablation isolating grounding-lift versus world-lift, and no industrial case study showing deployment cost.

That means business interpretation should stay disciplined.

What the paper directly shows:

implicit learning and lifted relational SOS inference can be combined;
partial observations can be incorporated through witnessed constraints and empirical moment bounds;
soundness and completeness can be proved under testability assumptions;
fixed SOS degree and quantifier rank allow polynomial-time compilation;
double lifting avoids obvious exponential blow-ups from individuals and partial worlds.

What Cognaptus infers:

the framework points toward auditable decision-support systems for relational domains with missing data;
the most plausible early applications are constrained reasoning tasks, not open-ended prediction;
the method may be valuable where ruling out inconsistent hypotheses matters more than forecasting every outcome.

What remains uncertain:

whether useful real-world constraints are often testable enough;
whether low-degree SOS proofs suffice in practical relational domains;
whether solver performance remains acceptable for rich enterprise schemas;
how robust the approach is under non-i.i.d., policy-driven, or adversarial missingness;
how domain experts would author and validate the required knowledge bases.

This is not a weakness to hide. It is the map from paper to product.

The real contribution is disciplined incompleteness

Most AI systems handle incompleteness by trying to complete the world. They impute missing values, infer latent states, train a model, and then pretend the final answer has absorbed the uncertainty. Sometimes that is appropriate. Sometimes it is just confidence laundering with better branding.

This paper offers a more disciplined alternative. It accepts that the world is partially observed, that rules may be incomplete, and that learning the full distribution may be infeasible. Then it asks what can still be proved.

The double lift is the technical heart:

grounding-lift keeps relational symmetry from exploding over individuals;
world-lift keeps partial observations from exploding over possible completions.

Together, they allow the system to learn only the bounds it needs and reason over them formally.

For business AI, this points to a sober but valuable future: not autonomous agents that “understand the business,” but reasoning components that can certify when a conclusion follows from bounded evidence and explicit constraints. Less theatre. More accounting. In AI, this is often an improvement.

The paper does not give us a deployable product. It gives us a mechanism worth watching: model-free relational probabilistic inference that treats missing data as a formal object rather than an embarrassment to be patched before the meeting.

That is the useful lesson. Sometimes the smartest model is the one you never build.

Cognaptus: Automate the Present, Incubate the Future.

Luise Ge, Brendan Juba, Kris Nilsson, and Alison Shao, “Lifted Relational Probabilistic Inference via Implicit Learning,” arXiv:2602.14890, 2026. https://arxiv.org/abs/2602.14890 ↩︎

The expensive mistake is trying to learn the whole model first#

The paper reasons by refutation, not prediction#

Sum-of-squares turns probabilistic reasoning into an algebraic feasibility problem#

Partial observations are handled as masked worlds, not as missing cells#

The lottery example explains why grounding must be fixed#

The second lift avoids enumerating every partial world#

The algorithm learns bounds, then asks the solver whether the system survives#

The theorem stack is the evidence#

This is model-free, not assumption-free#

The business value is certified narrowing, not omniscient prediction#

The misconception: this is not an LLM reasoning paper#

Where this could fit in an AI system architecture#

The boundary section: promising theory, no empirical scalability claim#

The real contribution is disciplined incompleteness#