Reflection-Loop Reliability

An agent's first answer is usually its worst. Reflection loops let it critique and revise its own output, and let you measure the reliability gain instead of hoping for one.

Ships ~late July 2026

Course outline

Why the first answer is the worst
Where single-pass generation fails, and why reliability, not capability, is usually the gap.
The reflection loop
Generate, critique, revise: the core loop, built in LangGraph as an explicit cycle you can inspect.
Stopping criteria
When to stop reflecting, convergence, budgets, and guards against loops that revise forever.
Measuring reliability
Turn 'it feels better' into a number with LangSmith evals, datasets, scorers, and before/after deltas.
Cost vs. reliability
Every reflection pass costs tokens and latency. Decide where the curve stops paying off.
Deploying a reflection agent
Ship it with LangSmith Deployment, in both Python and TypeScript, with the loop observable in production.

The literature this course rests on

Six papers, the minimum reading behind the reflection-loop patterns taught here.

Shinn, N., et al. (2023). Reflexion: Language agents with verbal reinforcement learning. https://arxiv.org/abs/2303.11366
Madaan, A., et al. (2023). Self-Refine: Iterative refinement with self-feedback. https://arxiv.org/abs/2303.17651
Yao, S., et al. (2023). Tree of Thoughts: Deliberate problem solving with large language models. https://arxiv.org/abs/2305.10601
Wei, J., et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. https://arxiv.org/abs/2201.11903
Bai, Y., et al. (2022). Constitutional AI: Harmlessness from AI feedback. https://arxiv.org/abs/2212.08073
Yao, S., et al. (2022). ReAct: Synergizing reasoning and acting in language models. https://arxiv.org/abs/2210.03629