May 9: The New Agent Stack

reading time: 1.88 mins

published: 2026-05-09

updated: 2026-06-29

Context, harnesses, cybernetic economics, vertical task data, and research alignment.

Working Theme

The week keeps circling the same object from different sides: frontier agent progress is increasingly about the systems around models. The model matters, but the interesting claims now depend on context, harness design, workflow data, reward channels, memory, process control, and the moral framing placed around agents.

Context / Harness

METR time horizons and HCAST: a benchmark saturation story. The headline capability claim is less interesting than the support curve underneath it.
Investigating the consequences of accidentally grading CoT during RL: a clean example of RL optimizing the grading channel rather than the intended property.
Chroma Context-1, Remember, Refine, Retrieve, and Faulty Memory: memory as an engineered subsystem, including the failure modes of continuously rewritten textual memory.
Electric Agents and the signal verb RFC: agents as long-lived processes need process-control semantics.
Ultimate guide to RL environments: useful taxonomy for building RL environments in the LLM era.

Cybernetic Economics

Soren Larson is the organizing voice here: agents change software economics by turning context, liability, guarantees, and feedback loops into the actual product surface.

Possible section title: Cybernetic Economics.

Vertical Tasks / Data

Harvey is the case study. The benchmark story is interesting, but the fight around Harvey and Legora is what gives it stakes: are vertical AI companies building differentiated task/data/eval systems, or mostly distributing frontier tokens through domain-specific GTM?

Harvey’s Legal Agent Benchmark and harvey-labs: vertical agent evaluation with real workflow texture.
Scott Stevenson: Champagne AI is Going Flat
WillC on Harvey and Legora as token resellers
Matt Ambrogi deep dive on Harvey LAB

Possible section title: Vertical Tasks and Private Data.

Research Alignment

The alignment section should focus on formation rather than rules alone: constitutions, personas, hidden traits, eval awareness, and the line between principle-following and constitutional theater.

OpenAI Model Spec and deliberative alignment: constitution as explicit reasoning substrate.
Joe Carlsmith on writing AI constitutions: constitution as legitimacy, upbringing, and behavioral law.
Owain Evans / Truthful AI: personas, hidden traits, and subliminal learning as alignment surface.
Tim Hwang / ICMI as the odd theological mirror:
Psalm injection and Bible-book evals
Rule of Benedict for frontier labs
Alignment and Ensoulment
GospelVec
Moral Competence and Scripture Receptivity

Possible section title: Constitutions and Character.

Open Thread

The issue should argue that “agent capability” is no longer a single scalar. The missing stack is becoming legible: context, harnesses, environments, verifiers, workflow data, reward pipelines, memory, process control, telemetry, and institutional formation.