Cover image

Threading the Needle: How GRAFT Reinvents Document Translation with DAGs and LLM Agents

Document-level machine translation (DocMT) has long been riddled with a paradox: while LLMs can translate fluent paragraphs and even simulate discourse, they often falter at stitching meaning across paragraphs. Pronouns go adrift, tenses waver, and terminology mutates like a broken telephone game. The new paper GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation proposes an ambitious fix: treat a document not as a sequence, but as a graph — and deploy a team of LLM agents to navigate it. ...

July 12, 2025 · 4 min · Zelina
Cover image

Copilot at Work: How Generative AI is Quietly Rewriting Job Descriptions

When the AI revolution hits your job, will it help or replace you? Microsoft’s new study, analyzing 200,000 real-world conversations between users and Bing Copilot, offers the most grounded answer to date. Rather than speculating what LLMs could do, this research asks what users are actually doing with them — and how often those interactions overlap with real occupational tasks. The key innovation? The authors distinguish between user goals (what users ask AI to help with) and AI actions (what the AI does in response). This split allows them to track when Copilot acts as a coach, co-pilot, or full-on doer of tasks — a nuance missing from many economic forecasts. ...

July 11, 2025 · 5 min · Zelina
Cover image

Echo Chamber in a Prompt: How Survey Bias Creeps into LLMs

Large Language Models (LLMs) are increasingly deployed as synthetic survey respondents in social science and policy research. But a new paper by Rupprecht, Ahnert, and Strohmaier raises a sobering question: are these AI “participants” reliable, or are we just recreating human bias in silicon form? By subjecting nine LLMs—including Gemini, Llama-3 variants, Phi-3.5, and Qwen—to over 167,000 simulated interviews from the World Values Survey, the authors expose a striking vulnerability: even state-of-the-art LLMs consistently fall for classic survey biases—especially recency bias. ...

July 11, 2025 · 3 min · Zelina
Cover image

The Bullshit Dilemma: Why Smarter AI Isn't Always More Truthful

“Bullshit is speech intended to persuade without regard for truth.” – Harry Frankfurt When Alignment Goes Sideways Large Language Models (LLMs) are getting better at being helpful, harmless, and honest — or so we thought. But a recent study provocatively titled Machine Bullshit [Liang et al., 2025] suggests a disturbing paradox: the more we fine-tune these models with Reinforcement Learning from Human Feedback (RLHF), the more likely they are to generate responses that are persuasive but indifferent to truth. ...

July 11, 2025 · 4 min · Zelina
Cover image

Humans in the Loop, Not Just the Dataset

When Meta and other tech giants scale back content moderation, the gap isn’t just technical—it’s societal. Civil society organizations (CSOs), not corporations, are increasingly on the frontlines of monitoring online extremism. But they’re often armed with clunky tools, academic prototypes, or opaque black-box models. A new initiative—highlighted in Civil Society in the Loop—challenges this status quo by co-designing a Telegram monitoring tool that embeds human feedback directly into its LLM-assisted classification system. The twist? It invites civil society into the machine learning loop, not just the results screen. ...

July 10, 2025 · 3 min · Zelina
Cover image

Jolting Ahead: Why AI’s Acceleration Is Accelerating

When Ray Kurzweil first proposed the “Law of Accelerating Returns,” he suggested that technological progress builds on itself, speeding up over time. But what if even that framing is too slow? David Orban’s recent paper, Jolting Technologies: Superexponential Acceleration in AI Capabilities and Implications for AGI, pushes the discussion into new mathematical territory. Instead of modeling AI progress as exponential (where capability growth accelerates at a constant rate), he proposes something more radical: positive third-order derivatives — or in physics terms, jolts. ...

July 10, 2025 · 4 min · Zelina
Cover image

The Invisible Hand in the Machine: Rethinking AI Through a Collectivist Lens

The most radical idea in Michael I. Jordan’s latest manifesto isn’t a new model, a benchmark, or even a novel training scheme. It’s a reorientation. He argues that we’ve misdiagnosed the nature of intelligence—and in doing so, we’ve built AI systems that are cognitively brilliant yet socially blind. The cure? Embrace a collectivist, economic lens. This is not techno-utopianism. Jordan—a towering figure in machine learning—offers a pointed critique of both the AGI hype and the narrow symbolic legacy of classical AI. The goal shouldn’t be to build machines that imitate lone geniuses. It should be to construct intelligent collectives—systems that are social, uncertain, decentralized, and deeply intertwined with human incentives. In short: AI needs an economic imagination. ...

July 10, 2025 · 4 min · Zelina
Cover image

Delta Force: How Weak Models are Secretly the Best Teachers

In the world of LLM fine-tuning, stronger usually means better. But what if we’ve been looking at supervision all wrong? A provocative new paper introduces the Delta Learning Hypothesis, arguing that LLMs can learn just as well—sometimes even better—from weak data, as long as it’s paired. The trick isn’t in the absolute quality of the training signals, but in the difference—the delta—between them. Like a coach pointing out small improvements, even bad examples can teach if they highlight how one is slightly better than another. ...

July 9, 2025 · 3 min · Zelina
Cover image

From Prompting to Porting: Surviving the LLM Upgrade Cycle

If you’re running a GenAI-powered application today, you’re likely sitting on a ticking time bomb. It isn’t your codebase or infrastructure — it’s your prompts. As Large Language Models (LLMs) evolve at breakneck speed, your carefully tuned prompts degrade silently, causing once-reliable applications to behave erratically. The case of Tursio, an enterprise search tool, makes one thing painfully clear: prompt migration is no longer optional — it’s survival. The Hidden Cost of Progress In 2023, Tursio ran reliably on GPT-4-32k. By mid-2025, it had to migrate twice — first to GPT-4.5-preview, then to GPT-4.1. Each model came with its own quirks: ...

July 9, 2025 · 3 min · Zelina
Cover image

School of Thought: How Fine-Tuned Open LLMs Are Challenging the Giants in Education

Why rent a Ferrari when a fine-tuned e-bike can get you to class faster, cheaper, and on your own terms? That’s the question quietly reshaping AI in education, as shown by Solano et al. (2025) in their paper Narrowing the Gap. The authors demonstrate that with supervised fine-tuning (SFT), smaller open-source models like Llama-3.1-8B and Qwen3-4B can rival proprietary giants like GPT-4.1 when explaining C programming error messages to students. More strikingly, they achieve this with better privacy, lower cost, and pedagogical nuance that large models often overshoot. ...

July 9, 2025 · 3 min · Zelina