Delta Force: How Weak Models are Secretly the Best Teachers
TL;DR for operators Training budget is usually where elegant AI strategy goes to die. The paper behind this article argues that preference tuning does not always need a superior teacher response. It may only need a useful contrast. A model can improve by learning that one weak answer is better than an even weaker one, even when neither answer is as good as what the model can already produce.1 ...