Long-Horizon RL / Goal-Conditioned Control

published: 2026-05-09

updated: 2026-05-09

Sequence planning shell for long-horizon RL, goal-conditioned control, horizon reduction, and planning.

Sequence: Long-Horizon RL / Goal-Conditioned Control

Main Ideas And Sequence Order

Blank for collaborative planning.

References

Horizon Generalization in Reinforcement Learning — Project page for work on RL policies generalizing across task horizons.
Horizon Generalization in Reinforcement Learning paper — Paper formalizing horizon generalization as a long-horizon RL problem.
Offline Goal-conditioned Reinforcement Learning with Quasimetric Representations — Paper using quasimetric representations for offline goal-conditioned RL.
Horizon Reduction Makes RL Scalable — Paper arguing that reducing effective horizon length improves RL scalability.
A Single Goal is All You Need — Paper on goal-conditioned approaches as a unifying control primitive.
RL for Planning and Planning for RL — CMU blog post connecting reinforcement learning and planning.
Ultimate guide to RL environments in the LLM era — Guide to constructing RL environments for modern LLM systems.
Adithya S K RL environments thread — Thread introducing or explaining the RL environments guide.
Baseten Loops thread — Thread on production loops for RL and long-sequence training.
Charlie O’Neill on open-source RL libraries breaking — Thread on practical failures in open-source RL stacks.
TRL v1.4 thread — Update thread for Hugging Face TRL.
State of RL for reasoning LLMs — Overview essay on RL methods for reasoning LLMs.
DreamCoder — Classic program-synthesis/RL-adjacent paper on learning reusable abstractions.
Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs — Paper on optimizing multi-stage LM programs through instructions and demonstrations.
On Training Large Language Models for Long-Horizon Tasks — Paper on training LLMs for longer-horizon tasks.
Position: agentic AI orchestration should be Bayes-consistent — Position paper on principled orchestration for agentic AI.
Tilde Aurora optimizer thread — Thread about an optimizer relevant to agent/RL training loops.