We will conduct a test of LLM’s ability to determine whether claims about social and behavioural science phenomena are true or false. We operationalize this question by investigating whether LLMs can assess a scientific paper and test whether the primary findings reproduce successfully. Institute for Replication (I4R) has existing research, infrastructure, workflows, and financial support that this effort could leverage directly to accelerate progress on evaluating LLM capabilities on this real-world task. As a pilot, Replication Games will involve randomly separating teams into three treatment arms: human-only, human-machine, and machine with restricted human input. With Open Philanthropy support we will organize 7 additional replication games testing the performance of human-machine and machine-only teams at various universities. We will recruit about 450 participants and write a paper summarizing results to be published in a leading outlet such as Nature, Science or PNAS. Duration: 15 months (April 2024 – June 2025).