Enterprise Search

K-Means, K-Gone: Sparse Coding and the Retrieval Bottleneck

Indexing is where many retrieval systems quietly become expensive. The demo looks harmless: upload documents, create embeddings, ask questions, receive answers with citations. Then the corpus starts behaving like a real business corpus. Policies change. Product pages are rewritten. Compliance documents are replaced. Support tickets arrive every hour. The retrieval layer must keep up, and suddenly the glamorous RAG stack is waiting for the plumbing to rebuild itself. As usual, the least photogenic component is the one holding the invoice. ...

The Search That Remembers: Training AI Without Answers

Search looks cheap until you try to train it. A business can usually collect plenty of questions. Employees ask support bots why a policy changed. Analysts ask internal search systems for comparable transactions. Legal teams ask where a contract clause first appears. Researchers ask agents to chase a multi-step trail across documents, web pages, and databases. ...

When Words Start Walking: Rethinking Semantic Search Beyond Averages

Search fails in a very ordinary way. A lawyer looks for a clause without remembering the exact wording. A finance analyst searches a prospectus for an operating-profit statement, but types only the economic idea. A compliance officer remembers a person’s role, not the sentence where the role was declared. The system returns either too much, too little, or the wrong thing wearing the right keywords. Everyone then calls it “semantic search,” because apparently disappointment sounds better in Greek. ...

Choosing Topics Without Counting: When LDA Meets Black-Box Intelligence

Topic modeling has a small, annoying question hiding inside a very large workflow: How many topics should the model use? Not what the topics mean. Not whether the dashboard looks elegant. Not whether management will discover a “strategic insight” after renaming a cluster from miscellaneous complaints to emerging customer sentiment. Just the integer: 10 topics, 30 topics, 80 topics, 200 topics? ...

Prints Charming: How Reward Models Finally Got Serious About Long-Horizon Reasoning

Search looks simple until it becomes a workflow. A human analyst can open ten tabs, notice which source contradicts which, remember that one earlier search result changed the meaning of the question, and decide whether the next move should be another search, a calculation, or a final answer. An LLM agent can also open tabs, call tools, browse pages, run code, and produce a final answer. The difference is that the agent often does all of this with the discipline of a caffeinated intern who has been told that “more context” is the same thing as “better memory.” ...