Weekly Digest

Jan 19-25, 2026

AI started leaking books, solving math, and editing code for days

Stories77

Unverified14

Read time5 min read

77 Stories14 unverified5 min read

Listen as a podcast

Listen as podcast

0:00/5:39

The Big Picture

For years, “the model can’t possibly remember a whole book” was a comforting assumption. Last week, Stanford researchers showed the opposite: with jailbreak-style prompts, they could coax LLMs into spitting out long, verbatim passages from in-copyright titles, including Harry Potter. The tension is obvious: the smarter models get, the harder it is to tell whether they are reasoning or replaying.

Meanwhile, the frontier kept moving on capability. GPT-5.2 Pro reportedly set a new FrontierMath Tier 4 record by solving 15 of 48 problems, while Stanford’s Test-Time Training work claims open models can beat closed giants (and even humans) on tough scientific and algorithmic discovery tasks. And on the “AI that actually does things” front, Cursor shipped agents that can refactor real codebases for hours or days.

Put together, last week drew a sharp line through the AI landscape: agents are getting more autonomous, benchmarks are getting more realistic (Terminal-Bench, APEX-Agents), and the security and governance surface is widening at the same time (malicious AI swarms, exploit-generation benchmarks, South Korea’s new high-risk AI oversight law).

Next up: expect a wave of enterprise “AI rollout” tooling, plus louder fights over provenance, licensing, and verification as models become both more capable and harder to audit.

AGI Probability Assessment

View TrackerTracker

60.0%+1.0%

Est. 20 months to AGI

Chance of production-ready AGI within 3 years, assessed by AI analysis of this week's developments

Last week's momentum on autonomy and long-horizon work continued with Cursor agents reportedly refactoring real codebases for hours-to-days, which is a concrete step toward “owns a project” behavior rather than chat-based assistance. Reasoning signals also strengthened: GPT-5.2 Pro’s FrontierMath Tier 4 record (15/48) and Stanford’s test-time training results suggest harder, discovery-style problem solving can improve without only relying on ever-larger training runs. However, the new APEX-Agents results (<25% first-try on realistic Workspace tasks) temper the near-term reliability story, keeping the increase modest.

Last Week in Numbers

15 of 48 (31%)

GPT-5.2 Pro’s FrontierMath Tier 4 result (new record)

2.4T

Baidu ERNIE 5.0 parameter scale (with sparse activation)

under 25%

Top frontier models’ first-try score on APEX-Agents professional Google Workspace tasks

15 of 48 (31%)

GPT-5.2 Pro’s FrontierMath Tier 4 result (new record)

2.4T

Baidu ERNIE 5.0 parameter scale (with sparse activation)

under 25%

Top frontier models’ first-try score on APEX-Agents professional Google Workspace tasks

Key Developments

Major|x.com

Stanford shows copyrighted books extractable from LLMs

This is significant because it turns “memorization risk” into a reproducible extraction workflow, not a theoretical concern. Previously, copyright worries centered on short quotes; now researchers report long, verbatim passages can be elicited with jailbreak-style prompts, raising stakes for deployment, licensing, and model auditing.

For instance

More weeklies

xAI Ignites World's First 1GW AI Training ColossusOlder AI Proves an Erdős Problem as Agents Hit RealityNewer

Weekly Digest

Terminal

Weekly Digest

Weekly Digest

Weekly Digest

Weekly Digest

AI started leaking books, solving math, and editing code for days

Stanford shows copyrighted books extractable from LLMs

Open models beat frontier giants via test-time training

GPT-5.2 Pro sets FrontierMath Tier 4 record

Cursor agents now refactor codebases for days

Odyssey-2 Pro brings long interactive world simulations

New benchmarks expose how brittle ‘real work’ agents are

South Korea mandates oversight for high-risk AI systems

Anamnesis shows frontier models generating real exploit chains

Anthropic finds a neural ‘Assistant Axis’ to limit persona drift