Plug Me In: Why LLMs with Tools Beat LLMs with Size

The latest research out of Heriot-Watt University doesn’t just challenge the notion that bigger is better — it quietly dismantles it. In their newly released Athena framework, Nripesh Niketan and Hadj Batatia demonstrate how integrating external APIs into LLM pipelines can outperform even the likes of GPT-4o and LLaMA-Large on real tasks like math and science. And they didn’t just beat them — they lapped them. Why GPT-4 Still Fumbles Math Ask GPT-4o to solve a college-level math problem, and it might hallucinate steps or miss basic arithmetic. The reason? LLMs, even at trillion-parameter scale, are not calculators. They’re probabilistic machines trained on patterns, not deterministic reasoners. ...

July 14, 2025 · 3 min · Zelina

Divide and Conquer: How LLMs Learn to Teach

Divide and Conquer: How LLMs Learn to Teach Designing effective lessons for training online tutors is no small feat. It demands pedagogical nuance, clarity, scenario realism, and learner empathy. A recent paper by Lin et al., presented at ECTEL 2025, offers a compelling answer to this challenge: use LLMs, but don’t ask too much at once. Their research reveals that breaking the task of lesson generation into smaller, well-defined parts significantly improves quality, suggesting a new collaborative model for scalable education design. ...

June 24, 2025 · 3 min · Zelina

The AI Buffet: Why One Supermodel Might Rule the Menu, But Specialty Dishes Still Sell

The AI Buffet: Why One Supermodel Might Rule the Menu, But Specialty Dishes Still Sell Two weeks ago, OpenAI made another bold move: it replaced DALL·E 3 with a native 4o Image Generation model, built directly into ChatGPT (OpenAI, 2025). This shift wasn’t just a backend tweak — it marked the arrival of a more capable, photorealistic, and context-aware image generator that functions seamlessly inside a chat conversation. To rewind briefly: OpenAI had launched GPT-4o on May 13, 2024, integrating text, image, and code generation into a single chatbox (OpenAI, 2024). While this multimodal model supported image generation, it was powered by DALL·E 3. ...

April 8, 2025 · 5 min