Foundation Course
Reflection-Loop Reliability
An agent's first answer is usually its worst. Reflection loops let it critique and revise its own output, and let you measure the reliability gain instead of hoping for one.
Ships ~late July 2026
Course outline
Why the first answer is the worst
Where single-pass generation fails, and why reliability, not capability, is usually the gap.
The reflection loop
Generate, critique, revise: the core loop, built in LangGraph as an explicit cycle you can inspect.
Stopping criteria
When to stop reflecting, convergence, budgets, and guards against loops that revise forever.
Measuring reliability
Turn 'it feels better' into a number with LangSmith evals, datasets, scorers, and before/after deltas.
Cost vs. reliability
Every reflection pass costs tokens and latency. Decide where the curve stops paying off.
Deploying a reflection agent
Ship it with LangSmith Deployment, in both Python and TypeScript, with the loop observable in production.
The literature this course rests on
Six papers, the minimum reading behind the reflection-loop patterns taught here.
- Shinn, N., et al. (2023). Reflexion: Language agents with verbal reinforcement learning. https://arxiv.org/abs/2303.11366
- Madaan, A., et al. (2023). Self-Refine: Iterative refinement with self-feedback. https://arxiv.org/abs/2303.17651
- Yao, S., et al. (2023). Tree of Thoughts: Deliberate problem solving with large language models. https://arxiv.org/abs/2305.10601
- Wei, J., et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. https://arxiv.org/abs/2201.11903
- Bai, Y., et al. (2022). Constitutional AI: Harmlessness from AI feedback. https://arxiv.org/abs/2212.08073
- Yao, S., et al. (2022). ReAct: Synergizing reasoning and acting in language models. https://arxiv.org/abs/2210.03629
Tools used
Not shipped yet, want a nudge when it is?
The Foundation course ships ~late July 2026. Send a quick note and I'll tell you the day it's live.
Notify me when it ships →