Weekly Digest

Dec 29 - Jan 4, 2026

Recursive Agent Beats Humans on ARC-AGI by Small Team

Stories59

Unverified21

Read time3 min read

59 Stories21 unverified3 min read

Listen as a podcast

Listen as podcast

0:00/4:58

The Big Picture

A six-person team unveiled a recursive agent that surpasses human performance on ARC-AGI, one of the toughest benchmarks for abstract reasoning and core intelligence. By looping through planning, coding, testing, and refining with models like GPT 5.1, this lean system cracked problems that have long tested the limits of AI cognition.

ARC-AGI demands novel problem-solving without prior training data, mimicking child-like intelligence tests humans ace intuitively. Past top AIs hovered below 50% while humans hit 80-90%; this agent's recursive self-improvement loop pushes AI into human-exceeding territory, signaling a leap in autonomous reasoning.

Software developers gain a tireless collaborator that debugs complex codebases overnight, cutting resolution time from weeks to hours compared to manual reviews. Researchers in novel domains like materials science iterate thousands of hypotheses daily, accelerating discoveries that once spanned years. Robotics engineers deploy adaptive planners for real-world navigation, outperforming rigid scripts by adapting on-the-fly.

Combined with agent triumphs in SWE-bench and IMO math, this sets the stage for self-improving AI ecosystems. Watch for enterprise rollouts of recursive agents in Q1 2026.

AGI Probability Assessment

View TrackerTracker

52.0%+4.0%

Est. 28 months to AGI

Chance of production-ready AGI within 3 years, assessed by AI analysis of this week's developments

The recursive agent's surpassing of human performance on ARC-AGI represents a major breakthrough in abstract reasoning and autonomous self-improvement, justifying an upward adjustment. Strong agentic advances like Alibaba's ROME on SWE-bench and HAGeo's IMO gold further bolster momentum in reasoning and software tasks. Rumors of massive compute scaling add potential but are unverified.

Category Breakdown

Last Week in Numbers

57.4%

SWE-bench Verified score by Alibaba's ROME agent

28/30

IMO problems solved at gold medal level by HAGeo

$40B

SoftBank investment completing OpenAI funding

57.4%

SWE-bench Verified score by Alibaba's ROME agent

28/30

IMO problems solved at gold medal level by HAGeo

$40B

SoftBank investment completing OpenAI funding

Key Developments

Major|x.com

Alibaba's ROME hits 57.4% on SWE-bench Verified

This open-source 30B MoE agent excels at real software engineering tasks using just 3B active parameters, trained on 1M+ trajectories. It shatters prior agent benchmarks, enabling production-grade coding autonomy. Developers access frontier agent capabilities without proprietary black boxes.

For instance

More weeklies

Google Memo Bombshell: No Moat vs Open-Source AI OnslaughtOlder AI Solves 50-Year-Old Erdős Math Puzzle, Tao Confirms ProofNewer

Weekly Digest

Terminal

Weekly Digest

Weekly Digest

Weekly Digest

Weekly Digest

Recursive Agent Beats Humans on ARC-AGI by Small Team

Alibaba's ROME hits 57.4% on SWE-bench Verified

Recursive agent tops human baseline on ARC-AGI

AI agents autonomously build ROME model

HAGeo solves 28/30 IMO problems at gold level

xAI expands to nearly 2GW compute power