Long-Horizon RL / Goal-Conditioned Control
published: 2026-05-09
updated: 2026-05-09
Sequence planning shell for long-horizon RL, goal-conditioned control, horizon reduction, and planning.
Sequence: Long-Horizon RL / Goal-Conditioned Control
Main Ideas And Sequence Order
Blank for collaborative planning.
References
- Horizon Generalization in Reinforcement Learning — Project page for work on RL policies generalizing across task horizons.
- Horizon Generalization in Reinforcement Learning paper — Paper formalizing horizon generalization as a long-horizon RL problem.
- Offline Goal-conditioned Reinforcement Learning with Quasimetric Representations — Paper using quasimetric representations for offline goal-conditioned RL.
- Horizon Reduction Makes RL Scalable — Paper arguing that reducing effective horizon length improves RL scalability.
- A Single Goal is All You Need — Paper on goal-conditioned approaches as a unifying control primitive.
- RL for Planning and Planning for RL — CMU blog post connecting reinforcement learning and planning.
- Ultimate guide to RL environments in the LLM era — Guide to constructing RL environments for modern LLM systems.
- Adithya S K RL environments thread — Thread introducing or explaining the RL environments guide.
- Baseten Loops thread — Thread on production loops for RL and long-sequence training.
- Charlie O’Neill on open-source RL libraries breaking — Thread on practical failures in open-source RL stacks.
- TRL v1.4 thread — Update thread for Hugging Face TRL.
- State of RL for reasoning LLMs — Overview essay on RL methods for reasoning LLMs.
- DreamCoder — Classic program-synthesis/RL-adjacent paper on learning reusable abstractions.
- Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs — Paper on optimizing multi-stage LM programs through instructions and demonstrations.
- On Training Large Language Models for Long-Horizon Tasks — Paper on training LLMs for longer-horizon tasks.
- Position: agentic AI orchestration should be Bayes-consistent — Position paper on principled orchestration for agentic AI.
- Tilde Aurora optimizer thread — Thread about an optimizer relevant to agent/RL training loops.