GraphRAG

Cite Before You Write: Agentic RAG That Picks Graph vs. Vector on the Fly

Paper: Open-Source Agentic Hybrid RAG Framework for Scientific Literature Review (Nagori et al., 2025) One‑line: The authors wrap a hybrid RAG pipeline (Neo4j GraphRAG + FAISS VectorRAG) inside an agent (Llama‑3.3‑70B) that decides per query which retriever to use, then instruction‑tunes generation (Mistral‑7B) and quantifies uncertainty via bootstrapped evaluation. It’s open‑source and genuinely useful. Why this paper matters (beyond research circles) Business pain: Knowledge workers drown in PDFs. Static “semantic search + summarize” tools miss citation structure and provenance; worse, they hallucinate under pressure. What’s new: Dynamic query routing between graph queries (Cypher over Neo4j) and semantic + keyword retrieval (FAISS + BM25 + rerank). Then DPO nudges the generator to prefer grounded answers. So what: For regulated sectors (healthcare, finance, legal), this is a pattern you can implement today for auditable reviews with traceable sources and tunable confidence bands. The blueprint (concrete, reproducible) Ingestion: Pull bibliometrics (DOI, title, abstract, year, authors, PDF URL, source) from PubMed, arXiv, Google Scholar. Deduplicate and filter by cosine similarity of TF‑IDF keywords (keep top‑quartile relevance). ...

From Cora to Cosmos: How PyG 2.0 Scales GNNs for the Real World

Graph Neural Networks (GNNs) have come a long way since they solved Cora and PubMed node classification. But what happens when you want to model an entire traffic network, a biomedical knowledge graph, or a social graph with billions of nodes? That’s where PyG 2.0 steps in. The Industrialization of GNNs PyTorch Geometric (PyG) has been a dominant tool in the academic development of GNNs. With PyG 2.0, it graduates into the world of industrial-strength machine learning. This isn’t just a library update—it’s a fundamental re-architecture with three goals: ...

GraphRAG Without the Drag: Scaling Knowledge-Augmented LLMs to Web-Scale

When it comes to retrieval-augmented generation (RAG), size matters—but not in the way you might think. Most high-performing GraphRAG systems extract structured triples (subject, predicate, object) from texts using large language models (LLMs), then link them to form reasoning chains. But this method doesn’t scale: if your corpus contains millions of documents, pre-processing every one with an LLM becomes prohibitively expensive. That’s the bottleneck the authors of “Millions of GeAR-s” set out to solve. And their solution is elegant: skip the LLM-heavy preprocessing entirely, and use existing knowledge graphs (like Wikidata) as a reasoning scaffold. ...