Cover image

Structure Matters: Externalities and the Hidden Logic of GNN Decisions

When explaining predictions made by Graph Neural Networks (GNNs), most methods ask: Which nodes or features mattered most? But what if this question misses the real driver of decisions — not the nodes themselves, but how they interact? That’s the bet behind GraphEXT, a novel explainability framework that reframes GNN attribution through the lens of externalities — a concept borrowed from economics. Developed by Wu, Hao, and Fan (2025), GraphEXT goes beyond traditional feature- or edge-based attributions. Instead, it models how structural interactions among nodes — the very thing GNNs are designed to exploit — influence predictions. ...

July 26, 2025 · 3 min · Zelina
Cover image

The LoRA Mirage: Why Lightweight Finetuning Isn't Lightweight on Privacy

When we talk about parameter-efficient fine-tuning, LoRA (Low-Rank Adaptation) is often celebrated as a silver bullet: cost-effective, memory-efficient, and—many assume—safe. After all, it modifies only a small fraction of model parameters, sideloaded as low-rank matrices, while leaving the massive pretrained model backbone untouched. The prevailing belief has been that such minimal intervention can’t possibly memorize or leak sensitive data. This belief is now decisively debunked by LoRA-Leak, a landmark framework introduced in a new paper by researchers from Tsinghua and HKUST. Their findings are a wake-up call for AI developers and policymakers alike: even LoRA-finetuned models are highly vulnerable to membership inference attacks (MIAs)—and ironically, the very presence of the frozen pretrained model amplifies this leakage risk. ...

July 25, 2025 · 4 min · Zelina
Cover image

The Most Dangerous Query Is the One You Don't Question

In the age of natural language interfaces to databases (NLIDBs), asking the right question has never been easier—or more perilous. While systems like ChatGPT or SQL-Palm can convert everyday English into valid SQL, they often do so without interrogating the quality of the question itself. And as Peter Drucker warned, “The most dangerous thing is asking the wrong question.” Enter VeriMinder, a system built not to improve SQL syntax or execution accuracy—but to diagnose and refine the analytical intent behind the user’s query. It tackles a deceptively simple yet far-reaching problem: a well-formed SQL query that answers a poorly formed question can yield confident but misleading insights. This is particularly problematic in enterprise settings where non-technical users rely on LLM-based BI assistants. ...

July 25, 2025 · 4 min · Zelina
Cover image

The Two Minds of Finance: Testing LLMs for Divergence and Discipline

How do we judge whether an AI is thinking like a human—or at least like a financial analyst? A new benchmark, ConDiFi, offers a compelling answer: test not just whether an LLM gets the right answer, but whether it can explore possible ones. That’s because true financial intelligence lies not only in converging on precise conclusions but in diverging into speculative futures. Most benchmarks test convergent thinking: answer selection, chain-of-thought, or multi-hop reasoning. But strategic fields like finance also demand divergent thinking—creative, open-ended scenario modeling that considers fat-tail risks and policy surprises. ConDiFi (short for Convergent-Divergent for Finance) is the first serious attempt to capture both dimensions in one domain-specific benchmark. ...

July 25, 2025 · 4 min · Zelina
Cover image

Trained on Tickers, Tuned for Trust: The New Frontier of FinTech AI

From Spreadsheets to FinGPT: Why Finance Needs Its Own Foundation Models General-purpose LLMs like GPT-4 and Gemini have shown surprising skill in handling financial tasks — summarizing earnings reports, analyzing sentiment, even giving portfolio advice. But beneath this performance lies a troubling mismatch: these models aren’t trained for the language, structure, or regulation of finance. In high-stakes domains where every decimal and disclosure matters, hallucination isn’t just a bug — it’s a liability. ...

July 25, 2025 · 4 min · Zelina
Cover image

Forecasting a Smarter Planet: How EarthLink Reimagines Climate Science with Self-Evolving AI Agents

Climate science, once defined by hand-tuned code and static diagnostics, is entering a new phase of automation and adaptability. At the forefront is EarthLink, a self-evolving multi-agent AI platform built specifically to support Earth system science. But this isn’t another LLM wrapper for answering climate questions. EarthLink is something deeper: a scientific collaborator that plans experiments, writes code, debugs itself, interprets results, and learns with each use. From Toolkits to Thinking Partners Traditional tools like ESMValTool or ILAMB have standardized climate model evaluation, but they remain brittle and rigid. They require domain-specific programming expertise and offer little flexibility beyond predefined tasks. In contrast, EarthLink introduces a new paradigm: ...

July 24, 2025 · 4 min · Zelina
Cover image

From Cora to Cosmos: How PyG 2.0 Scales GNNs for the Real World

Graph Neural Networks (GNNs) have come a long way since they solved Cora and PubMed node classification. But what happens when you want to model an entire traffic network, a biomedical knowledge graph, or a social graph with billions of nodes? That’s where PyG 2.0 steps in. The Industrialization of GNNs PyTorch Geometric (PyG) has been a dominant tool in the academic development of GNNs. With PyG 2.0, it graduates into the world of industrial-strength machine learning. This isn’t just a library update—it’s a fundamental re-architecture with three goals: ...

July 24, 2025 · 3 min · Zelina
Cover image

GraphRAG Without the Drag: Scaling Knowledge-Augmented LLMs to Web-Scale

When it comes to retrieval-augmented generation (RAG), size matters—but not in the way you might think. Most high-performing GraphRAG systems extract structured triples (subject, predicate, object) from texts using large language models (LLMs), then link them to form reasoning chains. But this method doesn’t scale: if your corpus contains millions of documents, pre-processing every one with an LLM becomes prohibitively expensive. That’s the bottleneck the authors of “Millions of GeAR-s” set out to solve. And their solution is elegant: skip the LLM-heavy preprocessing entirely, and use existing knowledge graphs (like Wikidata) as a reasoning scaffold. ...

July 24, 2025 · 3 min · Zelina
Cover image

Tools of Thought: Why Reasoning Isn’t an Illusion After All

In early 2025, Apple’s now-infamous “thinking-illusion” benchmark delivered a sobering verdict: large reasoning models (LRMs)—those step-by-step thinkers like DeepSeek-R1 and Qwen 3 Thinking—failed to show meaningful advantages over simpler LLMs. Their verbose, reflective outputs didn’t help on easy problems, nor did they scale on hard ones. In some cases, they even underperformed. But what if we were judging thinking models under unfair conditions? A new study titled “Thinking Isn’t an Illusion” argues that the problem isn’t with reasoning itself—it’s with reasoning in a vacuum. When these models are augmented with tools like Python interpreters and structured scratchpads, their performance transforms dramatically. In fact, they begin to consistently outperform their non-reasoning counterparts across a diverse set of logic puzzles. ...

July 24, 2025 · 4 min · Zelina
Cover image

From Snippets to Synthesis: INRAExplorer and the Rise of Agentic RAG

Most Retrieval-Augmented Generation (RAG) systems promise to make language models smarter by grounding them in facts. But ask them to do anything complex—like trace research funding chains or identify thematic overlaps across domains—and they break down into isolated snippets. INRAExplorer, a project out of Ekimetrics for INRAE, dares to change that. By merging agentic RAG with knowledge graph reasoning, it offers a glimpse into the next generation of AI: systems that don’t just retrieve answers—they reason. ...

July 23, 2025 · 3 min · Zelina