LLMs | Cognaptus

Fraud, Trimmed and Tagged: How Dual-Granularity Prompts Sharpen LLMs for Graph Detection

TL;DR for operators Fraud teams already know the problem: the suspicious review, shop, seller, or account is rarely suspicious in isolation. The useful evidence is scattered across neighbours — same user, same product, same rating pattern, same time window, same commercial ecosystem. The less useful evidence is also scattered there. At scale, that second pile is larger. How inconvenient. ...

When Your AI Disagrees with Your Portfolio

TL;DR for operators An AI investment assistant does not enter every portfolio discussion as a blank analyst. The paper behind this article shows that large language models can carry latent investment preferences: for certain sectors, for larger companies, and for contrarian rather than momentum arguments.1 The important mechanism is simple and uncomfortable. When buy and sell evidence are balanced, the model’s internal prior can break the tie. When counter-evidence later becomes stronger, that prior does not necessarily disappear. In mixed-evidence settings, the model may latch onto the fragment of evidence that supports its original inclination and discount the stronger opposing side. Splendid. Your “neutral” analyst has discovered confirmation bias and brought it to the investment committee. ...

The Sims Get Smart? Why LLM-Driven Social Simulations Need a Reality Check

TL;DR for operators LLM-driven social simulations are seductive because they make artificial agents speak, remember, plan, argue, apologise, panic, and occasionally organise a party. This is useful. It is not the same thing as modelling society. The paper’s central warning is simple: an agent that sounds believable at the individual level does not automatically produce valid collective dynamics.1 A simulation can pass the “that feels human” test while failing the “this corresponds to the real world” test. That gap matters if the output is used for market forecasting, policy rehearsal, public-risk modelling, workforce planning, or customer-behaviour analysis. ...

Steering by the Token: How GRAINS Turns Attribution into Alignment

TL;DR for operators GRAINS is not “fine-tuning, but cheaper.” That framing misses the point and commits the usual business sin of turning a mechanism into a procurement slogan. The paper’s useful claim is more specific: token-level attribution can be converted into an inference-time steering signal. Instead of retraining model weights, GrAInS identifies which text or image tokens most strongly push the model toward preferred or dispreferred outputs, builds layer-wise steering vectors from those activation shifts, and applies normalized edits during inference.1 ...

Think Twice, Then Speak: Deliberative Searcher and the Future of Reliable LLMs

TL;DR for operators Search-augmented LLMs are not safe merely because they can look things up. They can still retrieve relevant documents, stitch together a plausible answer, and then express high confidence in something wrong. That is the failure mode this paper targets: not hallucination in the abstract, but the operationally poisonous state of being both false and certain. ...

From Text to Motion: How Manimator Turns Dense Papers into Dynamic Learning

TL;DR for operators Manimator is best understood as a content-production pipeline, not as a magical professor trapped inside a video renderer. The system takes a prompt, PDF, or arXiv ID, asks an LLM to turn it into a structured scene plan, asks a code-focused LLM to generate Manim Python, and then renders the result into an explanatory animation.1 ...

The Clock Inside the Machine: How LLMs Construct Their Own Time

TL;DR for operators Dates look harmless. They sit in spreadsheets, contracts, forecasts, audit trails, delivery plans, and board decks pretending to be objective little integers. The problem is that a language model may not treat them as just integers. A new paper, The Other Mind: How Language Models Exhibit Human Temporal Cognition, studies how 12 large language models judge similarity between years from 1525 to 2524.1 The authors find that larger models often organise years around a subjective reference point near the recent present, rather than simply comparing numerical distance. The models also show logarithmic compression: years farther from that reference point become less finely distinguished, in a pattern reminiscent of the Weber-Fechner law in human perception. ...

Bridges and Biases: How LLMs Are Learning to Inspect Infrastructure

TL;DR for operators Bridge teams do not usually lack data. They lack enough expert time to turn dense inspection data into clear, defensible decisions. That is the operational gap this paper tries to narrow: not by replacing bridge engineers with a chatbot in a hard hat, thankfully, but by using multimodal LLMs to translate non-destructive evaluation contour maps into structured condition assessments and maintenance recommendations.1 ...

Latent Brilliance: Turning LLMs into Creativity Engines

TL;DR for operators Creative AI systems usually fail in a painfully familiar way: ask for ten ideas, and by idea four the model is politely repainting the same wall. Change the temperature, give it a persona, ask a panel of agents to “debate,” and the system may sound busier, but the semantic spread often remains narrow. The paper behind this article argues that this is not merely a prompt-design inconvenience. It is a structural limitation of how LLMs are conditioned. ...

Signals & Sentiments: How GPT-2 and FinBERT Beat Buy-and-Hold on the S&P 500

TL;DR for operators A recent arXiv paper tests whether financial-news sentiment from GPT-2 and FinBERT can improve S&P 500 trading when combined with technical indicators and time-series models.1 The strongest reported strategy, GPT-2 sentiment on Dow Jones news combined with VW MACD, returns 5.77% over the May 10-August 7, 2024 test period. The buy-and-hold benchmark returns -0.696% over the same window. ...