Cover image

From Graph to Grit: Diagnosing Warehouse Bottlenecks with LLMs and Knowledge Graphs

In the age of Digital Twins and hyper-automated warehouses, simulations are everywhere—but insights are not. Discrete Event Simulations (DES) generate rich, micro-level data on logistics flows, delays, and resource utilization, yet interpreting these data remains painfully manual, fragile, and siloed. This paper from Quantiphi introduces a compelling solution: transforming raw simulation outputs into a Knowledge Graph (KG) and querying it via an LLM agent that mimics human investigative reasoning. It’s a shift from spreadsheet-style summaries to an interactive AI assistant that explains why something is slow, where the bottleneck is, and what needs attention. ...

July 26, 2025 · 3 min · Zelina
Cover image

Planners, Meet Your Smart Sidekick

Imagine asking, “Why wasn’t Order A scheduled for production yesterday?” and getting not just an answer, but a causal breakdown, an alternative plan, and a visual comparison — all without involving your operations research (OR) consultant. That’s exactly what SMARTAPS delivers. Built by Huawei researchers, SMARTAPS is a tool-augmented LLM interface for interacting with Advanced Planning Systems (APS) using natural language. It doesn’t try to replace optimization solvers — it simply makes them accessible. In doing so, it redefines how planners interact with complex decision-making models. ...

July 26, 2025 · 3 min · Zelina
Cover image

Steering by the Token: How GRAINS Turns Attribution into Alignment

Fine-tuning is the hammer; steering is the scalpel. In an era where models are increasingly opaque and high-stakes, we need tools that guide behavior without overhauling the entire architecture. That’s precisely what GRAINS (Gradient-based Attribution for Inference-Time Steering) delivers: a powerful, interpretable, and modular way to shift the behavior of LLMs and VLMs by leveraging the most fundamental unit of influence—the token. The Problem with Global Steering Traditional inference-time steering approaches often rely on global intervention vectors: a blunt, one-size-fits-all shift in hidden activations derived from paired desirable and undesirable examples. But these methods are insensitive to which specific tokens caused bad behavior. It’s like adjusting a recipe because the dish tastes bad—without checking if the salt or the sugar was at fault. ...

July 26, 2025 · 3 min · Zelina
Cover image

Structure Matters: Externalities and the Hidden Logic of GNN Decisions

When explaining predictions made by Graph Neural Networks (GNNs), most methods ask: Which nodes or features mattered most? But what if this question misses the real driver of decisions — not the nodes themselves, but how they interact? That’s the bet behind GraphEXT, a novel explainability framework that reframes GNN attribution through the lens of externalities — a concept borrowed from economics. Developed by Wu, Hao, and Fan (2025), GraphEXT goes beyond traditional feature- or edge-based attributions. Instead, it models how structural interactions among nodes — the very thing GNNs are designed to exploit — influence predictions. ...

July 26, 2025 · 3 min · Zelina
Cover image

The LoRA Mirage: Why Lightweight Finetuning Isn't Lightweight on Privacy

When we talk about parameter-efficient fine-tuning, LoRA (Low-Rank Adaptation) is often celebrated as a silver bullet: cost-effective, memory-efficient, and—many assume—safe. After all, it modifies only a small fraction of model parameters, sideloaded as low-rank matrices, while leaving the massive pretrained model backbone untouched. The prevailing belief has been that such minimal intervention can’t possibly memorize or leak sensitive data. This belief is now decisively debunked by LoRA-Leak, a landmark framework introduced in a new paper by researchers from Tsinghua and HKUST. Their findings are a wake-up call for AI developers and policymakers alike: even LoRA-finetuned models are highly vulnerable to membership inference attacks (MIAs)—and ironically, the very presence of the frozen pretrained model amplifies this leakage risk. ...

July 25, 2025 · 4 min · Zelina
Cover image

The Most Dangerous Query Is the One You Don't Question

In the age of natural language interfaces to databases (NLIDBs), asking the right question has never been easier—or more perilous. While systems like ChatGPT or SQL-Palm can convert everyday English into valid SQL, they often do so without interrogating the quality of the question itself. And as Peter Drucker warned, “The most dangerous thing is asking the wrong question.” Enter VeriMinder, a system built not to improve SQL syntax or execution accuracy—but to diagnose and refine the analytical intent behind the user’s query. It tackles a deceptively simple yet far-reaching problem: a well-formed SQL query that answers a poorly formed question can yield confident but misleading insights. This is particularly problematic in enterprise settings where non-technical users rely on LLM-based BI assistants. ...

July 25, 2025 · 4 min · Zelina
Cover image

The Two Minds of Finance: Testing LLMs for Divergence and Discipline

How do we judge whether an AI is thinking like a human—or at least like a financial analyst? A new benchmark, ConDiFi, offers a compelling answer: test not just whether an LLM gets the right answer, but whether it can explore possible ones. That’s because true financial intelligence lies not only in converging on precise conclusions but in diverging into speculative futures. Most benchmarks test convergent thinking: answer selection, chain-of-thought, or multi-hop reasoning. But strategic fields like finance also demand divergent thinking—creative, open-ended scenario modeling that considers fat-tail risks and policy surprises. ConDiFi (short for Convergent-Divergent for Finance) is the first serious attempt to capture both dimensions in one domain-specific benchmark. ...

July 25, 2025 · 4 min · Zelina
Cover image

Trained on Tickers, Tuned for Trust: The New Frontier of FinTech AI

From Spreadsheets to FinGPT: Why Finance Needs Its Own Foundation Models General-purpose LLMs like GPT-4 and Gemini have shown surprising skill in handling financial tasks — summarizing earnings reports, analyzing sentiment, even giving portfolio advice. But beneath this performance lies a troubling mismatch: these models aren’t trained for the language, structure, or regulation of finance. In high-stakes domains where every decimal and disclosure matters, hallucination isn’t just a bug — it’s a liability. ...

July 25, 2025 · 4 min · Zelina
Cover image

Forecasting a Smarter Planet: How EarthLink Reimagines Climate Science with Self-Evolving AI Agents

Climate science, once defined by hand-tuned code and static diagnostics, is entering a new phase of automation and adaptability. At the forefront is EarthLink, a self-evolving multi-agent AI platform built specifically to support Earth system science. But this isn’t another LLM wrapper for answering climate questions. EarthLink is something deeper: a scientific collaborator that plans experiments, writes code, debugs itself, interprets results, and learns with each use. From Toolkits to Thinking Partners Traditional tools like ESMValTool or ILAMB have standardized climate model evaluation, but they remain brittle and rigid. They require domain-specific programming expertise and offer little flexibility beyond predefined tasks. In contrast, EarthLink introduces a new paradigm: ...

July 24, 2025 · 4 min · Zelina
Cover image

From Cora to Cosmos: How PyG 2.0 Scales GNNs for the Real World

Graph Neural Networks (GNNs) have come a long way since they solved Cora and PubMed node classification. But what happens when you want to model an entire traffic network, a biomedical knowledge graph, or a social graph with billions of nodes? That’s where PyG 2.0 steps in. The Industrialization of GNNs PyTorch Geometric (PyG) has been a dominant tool in the academic development of GNNs. With PyG 2.0, it graduates into the world of industrial-strength machine learning. This isn’t just a library update—it’s a fundamental re-architecture with three goals: ...

July 24, 2025 · 3 min · Zelina