The Chain-of-Thought (CoT) paradigm has become a cornerstone in improving the reasoning capabilities of large language models (LLMs). But as CoT matures, one question looms larger: Does every problem really need an elaborate chain? In this article, we dive into a new method called AdaR1, which rethinks the CoT strategy by asking not only how to reason—but how much.

Short vs. Long CoT: What’s the Difference?

In practice, a Short CoT delivers a concise, direct path to an answer—often skipping intermediate logical scaffolding. It mimics how an expert might solve a familiar or simple problem with minimal steps. By contrast, a Long CoT unfolds with detailed, step-by-step logic that walks through each assumption and derivation, resembling a student showing all their work on a complex math problem.

Why the Distinction Matters

The distinction isn’t merely academic. A uniform use of Long CoT inflates inference time, computational cost, and even error rates when applied unnecessarily to trivial tasks. Recent studies—including the AdaR1 paper—show that in many cases, Long CoT yields no accuracy gains and can actually degrade performance for simpler problems1.

Figure 1: The proportion of gain in the data (left) and the relationship between CoT length and accuracy improvement (right), Long-CoT reasoning improves accuracy on difficult problems but has little effect or harms performance on easy ones

Figure 1: The proportion of gain in the data (left) and the relationship between CoT length and accuracy improvement (right), Long-CoT reasoning improves accuracy on difficult problems but has little effect or harms performance on easy ones

More importantly, one can see parallels in human cognition: we don’t rehearse every multiplication table when adding 2 + 2, nor do we skim when proving a new theorem. Intelligence lies in knowing how much to think.

AdaR1: A Two-Stage Solution for Smart Reasoning

The AdaR1 framework proposes a hybrid reasoning model that integrates both short and long CoT behaviors and trains the model to select the optimal one dynamically. This is achieved through a two-stage pipeline:

  1. Model Merging: Parameters from short- and long-CoT models are linearly merged: $$\theta_H = \alpha \theta_L + (1 - \alpha) \theta_S$$

  2. Bi-Level Adaptive Optimization:

    • Group-Level: Learn whether short or long CoT is better for a given input.
    • Instance-Level: Within the chosen style, further optimize for conciseness and correctness.

To formalize the reasoning generation, the model represents the conditional output probability as:

$$ \pi_\theta(y|x) = \prod_{j=1}^m \pi_\theta(y_j|x, y_{<j}) $$

Let $\theta_L$ and $\theta_S$ be parameters of the Long-CoT and Short-CoT models, respectively. We define the merged model’s parameter as:

$$ \theta_H = \alpha \theta_L + (1 - \alpha) \theta_S $$

Group-level preference uses estimated accuracy expectations for long and short CoTs:

$$ \hat{\mathbb{E}}[C_L(x)] = \frac{1}{K} \sum_{i=1}^K \mathbb{1}[\text{Correct}(y^L_i)] $$

$$ \hat{\mathbb{E}}[C_S(x)] = \frac{1}{K} \sum_{j=1}^K \mathbb{1}[\text{Correct}(y^S_j)] $$

The group preference decision rule:

$$ \text{Choose Long-CoT if } \hat{\mathbb{E}}[C_L(x)] - \hat{\mathbb{E}}[C_S(x)] > \epsilon $$

Otherwise, choose Short-CoT. This leads to structured pairwise preference learning at both group and instance levels.

Figure 2: Pipeline of AdaR1. At Stage I, we fused the models to obtain $\pi_\theta^H$. In Stage II, we sample from both long and short models and then elicit the group-level and instance-level preference. After this, we optimize $\pi_\theta^H$ at both group and instance level to obtain a hybrid adaptive reasoning model

Figure 2: Pipeline of AdaR1. At Stage I, we fused the models to obtain $\pi_\theta^H$. In Stage II, we sample from both long and short models and then elicit the group-level and instance-level preference. After this, we optimize $\pi_\theta^H$ at both group and instance level to obtain a hybrid adaptive reasoning model

Theoretical and Empirical Justification

  • Inductive Rationale: From a Bayesian perspective, simpler hypotheses (shorter CoT) should be preferred unless evidence strongly supports the need for elaboration.
  • Scenario Testing: On a mixed-difficulty math dataset, AdaR1 outperformed baselines by reducing CoT length by over 50% while preserving accuracy—saving compute without sacrificing intelligence2.
  • Numerical Evidence: For example, in the GSM8K dataset, AdaR1 cut reasoning steps by 74% and even improved accuracy—a compelling indicator that less can be more when the context allows.

Closing Thoughts

AdaR1 doesn’t just trim fat—it embodies a shift in how we design reasoning agents. By distinguishing between the complexity of problems and adjusting the depth of reasoning accordingly, it paves the way for scalable, efficient, and more “human” AI.

As LLMs move toward real-time, embedded applications—from financial automation to robotic control—strategies like AdaR1 will prove not just efficient, but essential.



  1. Luo, Haotian et al. “AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization.” arXiv:2504.21659 [cs.AI], 2025. ↩︎

  2. Ibid., Table 1 and Figure 1. ↩︎