Solomon's approach to the USG / Lab Question
Its a question of which frontier
USG wants early access to the model frontier to harden its defenses and get a leg up in the zero sum arenas of cyber, conventional, and other varieties of warfare. Broad deployment of frontier systems makes that difficult, since most safeguards are not designed to be impervious to state-level actor jailbreaks, and broad deployment will inevitably lead to companies with questionable relationships to adversaries getting their hands on those safeguarded models.
So, we have a model freeze. Labs are upset because they can’t make the revenue they need off their expensive models to recuperate their ruinous training costs, and users are upset that progress is stalled and USG gets to decide which corporations get access to the frontier.
Seems like a problem: USG wants the models to itself, and labs / the market wants the frontier to be widely available.
But, it’s less of a problem than it appears. USG and labs + market don’t actually want opposite states of the world. Crucially, no one really cares about models that are slightly better but ruinously expensive. I’m not aware of anyone who heavily uses gpt-5.5-pro in production, and in any case the user base is so minute that we’re talking about orders of magnitude less impact to market usage than core workhorses like Fable. There’s an even more extreme example: gpt-4.5. Before RL and TTC scaling axes were found, scaling pretraining was just about it and so OpenAI invested quite a bit of resources into training a model substantially larger than gpt-4, but with more or less the same approach. Although it’d be totally realistic to train and deploy at scale such a model (rumored to be ~10T) today on Blackwells/Rubins, it wasn’t very ergonomic at the time and the model was more or less declared a commercial failure, despite being right on scaling trend for important benchmarks. Gpt-4.5 is no longer with us, but I miss it - it was clearly the most “aware” model released, with a roomy residual stream that could give high-quality judgements and seed scenarios for realistic data and training scenarios. It was meaningfully more intelligent than comparable non-reasoning models in all applications. However, partly due to the price, and it’s lack of SOTA posttraining, it had little market impact.
The post-training bit is a wrinkle - gpt-4.5 with o1-level posttraining would have been interesting - but in my opinion doesn’t change all that much. The model was just too expensive to serve on the hardware of the day for it to be driving anyone’s coding agents, RL or not. This gives us a precedent - at any given time, labs can train models that are exquisitely powerful, but not very interesting to the market, simply by going a size class above what is optimal for the latest NVIDIA chip.
Labs should simply train models that are uneconomically big and unprecedentedly intelligent and give them solely to USG and other organizations it designates. This gives the USG an enormous lead over its enemies - not just 2 weeks, but potentially quarter(s) of intelligence lead time. Yes it will be expensive, but winning wars tends to be expensive. So, the USG is massively better off in this arrangement. It also means both USG and Labs will have a huge buffer within which to observe what a given level of intelligence does in practice in rather controlled environments, well ahead of broader deployment. This is great for everyone, not least the safety crowd.
But also, it means that (worldwide) consumers get the absolute best models in pareto terms as soon as they’re ready. The pressure to gatekeep seriously goes down, and labs like OpenAI get to maintain their mission of providing humanity with abundant and open intelligence.