Semantic Search

Search Me: Why PIPER Makes Tables Findable When Metadata Goes Missing

Search Me: Why PIPER Makes Tables Findable When Metadata Goes Missing A data catalog is supposed to answer a simple question: where is the dataset I need? In practice, it often answers a different question: which dataset owner bothered to write a decent title, description, and tag list? That distinction matters. A table may contain exactly the columns, ranges, patient attributes, locations, dates, or transaction variables a team needs, while its metadata says something thrilling like “export_final_v3.csv.” The dataset is technically present. It is not findable. This is a familiar enterprise tragedy: the data lake is full, the catalog exists, and still everyone asks the same analyst where the useful files are. Excellent digital transformation, naturally. ...

When Structure Isn’t Enough: Teaching Knowledge Graphs to Negotiate with Themselves

A knowledge graph is supposed to make AI systems less vague. That is the pitch, at least. Instead of letting a model float around in text, we give it entities, relations, and structure. A person works at a company. A product belongs to a category. A supplier is connected to a shipment, an invoice, a warehouse, and eventually a mildly panicked operations manager. ...

When Words Start Walking: Rethinking Semantic Search Beyond Averages

Search fails in a very ordinary way. A lawyer looks for a clause without remembering the exact wording. A finance analyst searches a prospectus for an operating-profit statement, but types only the economic idea. A compliance officer remembers a person’s role, not the sentence where the role was declared. The system returns either too much, too little, or the wrong thing wearing the right keywords. Everyone then calls it “semantic search,” because apparently disappointment sounds better in Greek. ...

Beyond Cosine: When Order Beats Angle in Embedding Similarity

Search has a small ritual. Take two embeddings, compute cosine similarity, rank the results, and move on. The ritual is fast, familiar, and usually good enough. It is also so deeply embedded in AI infrastructure that many teams treat it less like a modeling choice and more like plumbing. That is convenient. It is not always innocent. ...

Ask Once, Query Right: Why Enterprise AI Still Gets Databases Wrong

Database. That is where many enterprise AI demos quietly go to die. The user asks one clean natural-language question: “How many customers are in California?” The AI assistant smiles politely, searches something, finds a table that looks relevant, and returns a confident answer. The problem is not that the model cannot understand English. The problem is that five internal databases may all contain customers, states, locations, stores, loans, accounts, or sales regions. Some can answer the question. Some can almost answer it. Some merely smell like they can answer it. ...

all-MiniLM-L6-v2

A compact and efficient sentence embedding model from Sentence Transformers, ideal for semantic search, clustering, and sentence similarity tasks.

BGE Large EN v1.5

A high-quality English embedding model from BAAI, optimized for semantic search, retrieval-augmented generation (RAG), and ranking tasks.