
PaperBench: Evaluating AI’s Ability to Replicate AI Research
About This Event
Can an AI rebuild a paper from scratch, code, experiments, results, with nothing but the PDF? PaperBench is the test. Let's read it closely and argue about what it really measures. PaperBench: Evaluating AI's Ability to Replicate AI Research (OpenAI) MLE-bench: Evaluating ML Agents on Machine Learning Engineering (OpenAI) RE-Bench: Evaluating Frontier AI R&D Capabilities of Language Model Agents (METR) SWE-bench: Can Language Models Resolve Real-World GitHub Issues? MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (Sakana AI) Can LLMs Generate Novel Research Ideas? (Si, Yang & Hashimoto) Bring the paper that changed your mind about what these systems can do. RSVP to unlock the address.
See the rest of the description and register on Luma.
Share Event
Date & Time
Wednesday, July 15, 2026
6:00 PM - 10:00 PM
Location
AGI House SF: 170 St. Germain Ave. San Francisco CA 94114