CFA | Cognaptus

Exams are useful because they are rude. They do not care that a model sounds polished, cites the right buzzwords, or can produce a gorgeous paragraph about duration risk. They ask for A, B, or C. Then they mark the answer wrong. That is why a new CFA-based benchmark is more useful than another misty-eyed essay about AI “transforming finance.” The paper evaluates GPT-4o, GPT-o1, and o3-mini on 1,560 official CFA mock multiple-choice questions across Levels I, II, and III, both zero-shot and with a domain-reasoning RAG pipeline built from official CFA curriculum materials.1 The result is not a single leaderboard. It is closer to a routing manual. ...