Orion One

reading time: 5.56 mins
published: 2024-09-14
updated: 2024-11-24

... Doesn't Get It

Do You Get It?

In startups, one of the most important traits to identify in a candidate is the extent to which they “get it.” Not every employee necessarily needs (for example) capacity for abstract reasoning, charisma, etc. But I believe they have to get it.

It’s hard to pin down what getting it entails, even though most people I’ve spoken to in startups immediately register its meaning when the subject is brought up. A concise formulation would be “knowing how to quickly calibrate an intuitive world-model for use in real-world, consequential domains.”

World Models not Maps

Here, “world model” doesn’t mean map/territory, nor haggling over the meaning of words and inventing fun conlangs to signal group identity. It’s more, can you make useful forecasts of the future based on vibes. A good athlete can assess in a split second which teammate he should pass to, based on vibes. A good founder can assess which features to prioritize in light of a development roadblock based on vibes. A good Magic the Gathering player can know when to flip between aggressive and control strategies based on vibes. These vibes can be filtered and, of course, communicated, via language. But, I’ve never seen language succeed in substituting vibes. If it could, there would be few founders in the intersection of (effectively memorized PG’s essays) and (would run a startup into the ground instantly), but of course this is a very busy intersection.

A simple way to obtain an intuitive world-model in a domain is to have reality punch you in the face 10k times. Athletes, gamers, and founders constantly get feedback from dynamic environments, so it’s hard for them not to develop meaningful world models, at least in part. But getting it is not really about having a world model, because even though anyone can get punched 10k times, eating all of those punches takes a lot of time and energy. “Getting it” is when you know the algorithm for speedrunning the process, for sequencing the 100 punches that get you to the same destination, but much faster and painlessly. It’s the person’s derivative on world-modeling, and in any fast moving field, it’s all that ends up mattering.

Words are Sacred

While language is too bandwidth-constrained to communicate world models, it is a powerful tool for the world model speedrunner, helping him to reason efficiently with less information, plan efficient face-punching experiments, and convince others to join in on advancing experiments, when appropriate. A speedrunner, even when playing casually with friends, often gives clear ‘tells’ in his gameplay. He’ll either instinctively avoid suboptimal moves, or make a show of employing them. Whatever he does, his fluid sense of command over the situation is impossible to miss, and often is so mesmerizing that thousands will pay to watch on a Friday night. One of the most consistent tells that a candidate “gets it” is when he makes very careful use of language. Every word is measured, he often pauses and qualifies statements, and simplicity/precision is absolutely paramount. There is a felt sense of command.

The classic hallmark of someone who doesn’t get it is yapping. The yapper vomits out every token that pops into his head - not that he could do any different, as he has no clear basis to discriminate between better ideas and worse ones, aside from perhaps a few socially-conditioned heuristics. Yappers can be very intelligent, eloquent, and charismatic. Often, listening to them is entertaining, or engaging, or something else. Whatever it is, it is not grounded - his speech will implicitly refer to a la-la land fuzzy approximation of reality, one superficially like our own but always lacking important details. Sometimes, multiple mutually-exclusive fantasies are implied by the same line of “reasoning.” Needless to say, anyone engaging in this sort of speech will not be able to consistently make good decisions for your startup, and certainly won’t be able to acquire that ability in a new domain as rapidly as possible.

There’s a kind of middle ground between getting it and yapping, where the person clearly has a strongly calibrated world model, and speaks very precisely with regard to their domain. But, because he has not mastered rapidly acquiring world models, he doesn’t demonstrate expert command of language-as-tool.

O1 Won’t Yap, But Also Won’t Get It

Generative pretrained transformers are quintessential, almost exemplary yappers. I won’t rehearse the previous paragraph, just read it again but with an LLM in mind.

Orion 1 excites me because it introduces reinforcement learning - essentially, allowing the model to get punched in the face 10 thousand times, helping it learn which lines of reasoning work in different contexts, and which don’t. Orion 1 has an intuitive world model, and like an elite gamer or athlete splits up computation between voiceless internal circuits (low-level reward modeling and action dynamics), chain-of-thought scratch pad (reasoning and planning), and grounded interaction (highest level, final answer tokens). As it eats it punches across more subdomains and niches, the foundation model will get better and better at consistently taking effective actions in the world - or, at least, the world it was trained in.

Yet, even though it will have a great world-model installed, there’s nothing in the offline training OAI has thrown at Orion that would teach it to use language to actively learn better. Orion will be in the middle ground - it won’t yap - but it won’t yet get it. It will be excellent at taking actions in its domain (like an expert trader, or gamer), but lack the ability to rapidly acquire that ability in foreign ones. It still won’t be a great startup hire.

The Road to Getting It

OpenAI has chosen to break up its roadmap into stages, with the stages roughly being

  1. Yappers
  2. World Model / Middle Ground
  3. Gets It
  4. Gets It ++

Where in my opinion, 3 and 4 are basically the same, and only vary in their economic consequences (which, too be fair, OpenAI values).

If 2 is exciting, 3 is thrilling. But it will take time. Before we can teach models to learn WMs, we need to learn to teach models WMs.

Just like in real-life, I think “getting it” will be (somewhat) orthogonal to IQ and capacity for abstraction. We will be able to train 100M models that excel at efficiently learning Atari, while lacking to internal capacity to prove Millenium Prize problems or whatever, just like you can find killer startup operators that probably would fail freshman Calc 3.

In this sense, it’s less about “scaling intelligence” (a nonsense yapper phrase if there ever was one) and more about developing entirely different and new capabilities for AI systems. O1 is not yapper-5, and the first model to “get it” (G1?) won’t be O5 either.

Happy stage 2 to all who celebrate.