The AI boom has been driven by the assumption that bigger models are more powerful, but mounting costs are pressuring users to consider smaller and cheaper models. According to Coinbase co-founder Brian Armstrong, 80% of workloads will be running on 99% cheaper models within 12-18 months. This shift could have a significant impact on the industry, particularly for big labs like OpenAI and Anthropic.
Initial tests suggest that cheaper models can provide similar quality without sacrificing performance. For example, the legal AI tool Harvey was able to reduce inference costs by 3x without reducing quality by combining Claude Opus and Fireworks' GLM 5.1.
The trend is driven by a price war between in-house inference from big labs and independently served open-weight models, with companies looking for ways to reduce costs without compromising quality.



