LoRA | Cognaptus

Stop Scaling the Wrong Thing

TL;DR for operators Most AI performance failures are not solved by scaling the most visible knob. Three recent papers make the same uncomfortable point from different angles. A controlled image-classification study finds that more data gives more stable generalization gains than simply increasing model complexity, while added visual priors help only when the architecture can use them.1 A document parsing benchmark shows that frontier VLMs and specialized parsers still fail on expert documents with dense layouts, formulas, tables, music notation, rotation, and long-document reading order.2 A LoRA optimization paper argues that adapter performance is often limited not by rank alone, but by a mis-scaled LoRA scaling factor, usually treated as a small implementation detail because apparently we needed another reminder that details run the building.3 ...

Bigger Ears Still Need a Budget

TL;DR for operators The paper is not really saying “use a smaller speech model.” That would be too convenient, and reality hates convenience. It is saying something more useful: audio-model efficiency is a budget allocation problem. Model size, audio duration, encoder token resolution, and adaptation depth are different ways to spend compute, and they do not buy the same thing. Agarwal, Gangrade, Pal, and Wu study this across automatic speech recognition using Whisper on LibriSpeech and speech emotion recognition using wav2vec2 on CREMA-D.1 ...

LoRA Was Supposed to Fit on the Edge. The Activations Disagreed.

TL;DR for operators LoRA does not magically make LLM fine-tuning fit on phones, laptops, or small edge boxes. It reduces the number of trainable parameters. The paper’s useful contribution is showing that this is only the opening move. The real memory bill arrives from activations, checkpoint boundaries, vocabulary-sized output computations, and tokens that are being processed even though they do not contribute to the loss. Apparently the memory allocator did not attend the product strategy meeting. ...

LoRA’s Rank Excuse Has a Gradient Problem

TL;DR for operators LoRA is usually sold as a rank-and-cost compromise: train a small low-rank adapter instead of updating the whole model, accept some performance gap, and enjoy the budget meeting. The paper behind SDS-LoRA argues that this explanation is incomplete. The gap is not only because the adapter is low-rank. It is also because standard LoRA can distort the training signal that flows into that adapter.1 ...

LoRA, Less Luggage: Choosing the Right Shortcut for Instance Segmentation

A camera sees a plastic bottle, a dolphin, a car, or a suspicious object inside an X-ray scan. The business question is usually not philosophical. It is: can we adapt an existing vision model to this specific mess without retraining half the machine? That is where parameter-efficient fine-tuning sounds irresistible. Freeze most of the pretrained model. Add a small trainable module. Spend less money. Store fewer weights. Avoid turning every client dataset into a private bonfire of GPU time. Lovely. Procurement smiles. Engineers almost smile. ...

Rank and File: MatryoshkaLoRA Turns One Adapter into Many

The adapter budget problem is not just training cost Budget is usually where fine-tuning conversations become less glamorous. A team wants a customized model. The engineer suggests LoRA because full fine-tuning is expensive. Everyone nods. Then the uncomfortable question arrives: which rank? A low rank is cheap but may underfit. A high rank may work better but costs more memory and inference compute. So the team trains several adapters, compares them, chooses one, and pretends the search process was a minor detail. It was not. It was the hidden invoice. ...

No More Low-Rank Detours: GPart and the Geometry of Fine-Tuning

Adapters are supposed to make fine-tuning simple. A team takes a large pretrained model, freezes most of it, trains a small adapter for customer support, another for invoice extraction, another for compliance review, and so on. The pitch is attractive: less storage, less training cost, faster iteration, fewer excuses from the infrastructure team. Naturally, the adapter becomes the small and tidy object everyone wants to manage. ...

LoRA and Order: The Strange Case for One Well-Placed Adapter

Opening — Why this matters now Enterprise AI is entering its less glamorous, more useful phase: not “Can we connect an LLM to everything?” but “Can we adapt it without making the GPU bill look like a small infrastructure project?” Fine-tuning still matters. Retrieval helps with knowledge access, prompt engineering helps with behavior shaping, and agent frameworks help with workflow orchestration. But many businesses eventually hit the same wall: the base model is close, yet not close enough. It needs domain style, task format, compliance habits, tool-use discipline, or workflow-specific judgment. That usually means some form of supervised fine-tuning. ...

No Free Tokens: The New Economics of LLM Inference

Opening — Why this matters now For the last few years, AI strategy has been narrated as a model-quality story: bigger models, better benchmarks, longer context windows, more agents, more demos, more adjectives. That story was useful. It was also incomplete. The less glamorous reality is now arriving with the invoice attached. LLM systems are not merely models. They are production services that consume GPU memory, scheduling capacity, engineering attention, and operational patience. Once a business moves from a prototype to repeated daily use, the question changes from “Can the model answer?” to “Can the system answer reliably, cheaply, and repeatedly when real users arrive at inconvenient times?” ...

Rank and File: BoostLoRA’s Case for Smarter Fine-Tuning

Opening — Why this matters now Enterprise AI is entering its less glamorous phase: not the demo, not the keynote, not the charming chatbot that answers three curated questions correctly, but the operational grind of making models behave reliably inside messy workflows. That grind usually runs into a familiar triangle. Full fine-tuning is powerful but expensive, operationally heavy, and often risky when the training set is narrow. Parameter-efficient fine-tuning, especially LoRA-style adaptation, is cheaper and easier to deploy, but the smallest adapters can hit a ceiling. Meanwhile, the business user does not care whether the adapter was elegant. They care whether the model stops making the same costly mistakes in invoicing, compliance review, customer support, code generation, or scientific triage. ...