LLM | Cognaptus

From Ballots to Budgets: Can LLMs Be Trusted as Social Planners?

When you think of AI in public decision-making, you might picture chatbots handling service requests or predictive models flagging infrastructure risks. But what if we let large language models (LLMs) actually allocate resources—acting as digital social planners? That’s exactly what this new study tested, using Participatory Budgeting (PB) both as a practical decision-making task and a dynamic benchmark for LLM reasoning. Why Participatory Budgeting Is the Perfect Testbed PB is more than a budgeting exercise. Citizens propose and vote on projects—parks, public toilets, community centers—and decision-makers choose a subset to fund within a fixed budget. It’s a constrained optimization problem with a human twist: budgets, diverse preferences, and sometimes mutually exclusive projects. ...

From Byline to Botline: How LLMs Are Quietly Rewriting the News

The AI Pressroom Arrives — Mostly Unannounced When ChatGPT-3.5 launched in late 2022, it didn’t just disrupt classrooms and coding forums — it quietly walked into newsrooms. A recent large-scale study of 40,000+ news articles shows that local and college media outlets, often operating with lean budgets and smaller editorial teams, have embraced generative AI far more than their major-network counterparts. And in many cases, readers have no idea. The research, spanning opinion sections from CNN to The Harvard Crimson, and across formats from print to radio, found a tenfold jump in AI-written local news opinion pieces post-GPT. College newspapers followed closely with an 8.6× increase, while major outlets showed only modest uptake — a testament to stricter editorial controls or more cautious adoption policies. ...

From Black Box to Glass Box: DeepVIS Makes Data Visualization Explain Itself

When business leaders ask for a “quick chart,” they rarely expect to become detectives in the aftermath—trying to work out why the AI picked that chart type, grouped the data that way, or left out important categories. Yet that’s exactly the frustration with most Natural Language to Visualization (NL2VIS) tools today: they generate results like a magician pulling a rabbit from a hat, with no insight into how the trick was done. ...

From Stage to Script: How AMADEUS Keeps AI Characters in Character

When you chat with a VTuber’s AI twin or a game NPC that remembers your past adventures, breaking character can ruin the magic. Large language models (LLMs) have the raw conversational talent, but keeping them in character—especially when faced with questions outside their scripted knowledge—is notoriously difficult. AMADEUS, a new RAG-based framework, aims to fix that. The Problem with Persona Drift Most role-playing agents (RPAs) rely on a static “persona paragraph” to define who they are. Retrieval-Augmented Generation (RAG) can pull relevant persona chunks into context, but three problems persist: ...

From Zero to Reasoning Hero: How R-Zero Teaches Itself Without Human Data

In AI development, removing humans from the training loop has long been a holy grail — not because people aren’t valuable, but because human labeling is expensive, slow, and fundamentally limited. R-Zero, a new framework from Tencent AI Seattle Lab, takes a decisive step in that direction: no seed dataset, no human annotations, and no external verifier. Just two AI roles — Challenger and Solver — locked in an evolutionary arms race. ...

Mind the Gap: How Tool Graph Retriever Fixes LLMs’ Missing Links

In enterprise AI automation, the devil isn’t in the details—it’s in the dependencies. As LLM-powered agents gain access to hundreds or thousands of external tools, they face a simple but costly problem: finding all the right tools for the job. Most retrieval systems focus on semantic similarity—matching user queries to tool descriptions—but ignore a crucial fact: some tools can’t work without others. The result? A task that seems perfectly matched to a retrieved tool still fails, because a prerequisite tool never made it into the context window. Tool Graph Retriever (TGR) aims to solve this by making dependencies first-class citizens in retrieval. ...

The Diligent but Brittle Student Inside Every LLM

If you put a large language model in a classroom for a year, what kind of student would it become? According to Simulating Human-Like Learning Dynamics with LLM-Empowered Agents, the answer isn’t flattering: most base LLMs act like “diligent but brittle surface learners”—hardworking, seemingly capable, but unable to generalize deeply. From Psych Lab to AI Lab Educational psychology has spent decades classifying learners into profiles like deep learners (intrinsically motivated, reflective, conceptual) and surface learners (extrinsically motivated, test-oriented, shortcut-prone). The authors built LearnerAgent, a multi-agent framework grounded in these theories, and dropped four AI ‘students’ into a simulated high school English class: ...

From Tadpole to Titan: How DEVFT Grows LLMs Like a Brain

If federated fine-tuning feels like trying to teach calculus to a toddler on a flip phone, you’re not alone. While the privacy-preserving benefits of federated learning are clear, its Achilles’ heel has always been the immense cost of training large models like LLaMA2-13B across resource-starved edge devices. Now, a new method—DEVFT (Developmental Federated Tuning)—offers a compelling paradigm shift, not by upgrading the devices, but by downgrading the expectations. At least, at first. ...

Thinking Without Talking: How SynAdapt Lets LLMs Reason in Silence

When large language models (LLMs) reason step-by-step using Chain-of-Thought (CoT) prompting, they think out loud. That verbosity improves accuracy—but it’s also a luxury many applications can’t afford. From real-time voice assistants to robotics, excessive token generation slows everything down. The result is a fundamental bottleneck: performance versus efficiency. The paper SynAdapt: Learning Adaptive Reasoning in Large Language Models via Synthetic Continuous Chain-of-Thought offers a clever solution. Rather than generating verbose natural language steps, SynAdapt trains LLMs to reason silently, using internal vectors called synthetic continuous CoT (CCoT). And for harder problems—where silence isn’t enough—it smartly reroutes the model back into verbal reasoning mode. This hybrid, adaptive strategy achieves the best of both worlds. ...

From Scroll to Structure: Rethinking Academic Reading with TreeReader

For centuries, reading has meant scrolling—page by page, line by line. But what if reading could mean navigating a tree? TreeReader, a new system from researchers at the University of Toronto and the Vector Institute, challenges the linearity of academic literature. It proposes a reimagined interface: one where large language models (LLMs) summarize each section and paragraph into collapsible nodes in a hierarchical tree, letting readers skim, zoom, and verify with surgical precision. The result is more than a UX tweak—it’s a new cognitive model for how scholars might interact with complex documents in the era of AI. ...