Overview A hands-on evening building an agent that provably gets better: simulate the edge cases, score against evals, and gate every release so the next version ships only if it beats the last. You'll leave with the loop, built on open-source tools, running on a laptop. Event details "Self-improving agents" is doing a lot of work in pitch decks right now. Some of it is real, a lot of it isn't, and his evening separates the two by building the real version in front of you. We'll together build the real version live to show the difference. Before anything reaches production, we simulate synthetic users and adversarial scenarios to surface where the agent breaks, score those runs against evals, and feed the failures back in. The eval becomes the gate: nothing ships unless the numbers move the right way. Then we keep it running, routing production traffic back through the same evals so regressions show up as dropping scores instead of support tickets. Full recursive self-improvement is still an open research problem and we won't pretend otherwise, but the bounded version is buildable…

Can Agents Actually Self-Improve?

About This Event

Share Event

Date & Time

Location