Join the Snorkel AI Reading Group, a recurring forum to explore the latest frontier developments in AI while building meaningful connections within the community. In this session, lead researcher Henry Ehrenberg will present Senior SWE-Bench , Snorkel's new open-source benchmark for evaluating coding agents on the work we actually give them. Agenda: 4 pm - doors open 4:30 pm - talk begins 🧋🧋🧋 Boba tea and other refreshments will be provided ! 🧋🧋🧋 Among other things, you'll learn: Why most coding benchmarks treat agents like junior engineers (over-specified requirements, graded mainly on whether the code runs) when most of us already treat agents like senior engineers who work from a Slack message, not a spec. How Senior SWE-Bench's validation agent breaks the trade-off between realistic instructions and reliable grading: it writes behavioral tests adapted to each agent's actual solution, using expert-designed recipes the solving agent never sees. Why the benchmark's bug and performance tasks are sourced from real PRs with evidence of significant runtime investigation (logs,…

Reading Group (+🧋): Senior SWE-Bench

About This Event

Share Event

Date & Time

Location