Cover image

FAQ It Till You Make It: Fixing LLM Quantization by Teaching Models Their Own Family History

Opening — Why this matters now Large language models are getting cheaper to run, not because GPUs suddenly became charitable, but because we keep finding new ways to make models forget precision without forgetting intelligence. Post-training quantization (PTQ) is one of the most effective tricks in that playbook. And yet, despite years of algorithmic polish, PTQ still trips over something embarrassingly mundane: the calibration data. ...

January 20, 2026 · 4 min · Zelina
Cover image

Enhancing Privately Deployed AI Models: A Sampling-Based Search Approach

Enhancing Privately Deployed AI Models: A Sampling-Based Search Approach Introduction Privately deployed AI models—used in secure enterprise environments or edge devices—face unique limitations. Unlike their cloud-based counterparts that benefit from extensive computational resources, these models often operate under tight constraints. As a result, they struggle with inference-time optimization, accurate self-verification, and scalable reasoning. These issues can diminish trust and reliability in critical domains like finance, law, and healthcare. How can we boost the accuracy and robustness of such models without fundamentally redesigning them or relying on cloud support? ...

March 19, 2025 · 4 min · Cognaptus Insights