Weekly Digest

Feb 9-15, 2026

AI Starts Writing Science, While Context Hits 1M Tokens

Stories51

Unverified15

Read time5 min read

51 Stories15 unverified5 min read

Listen as a podcast

Listen as podcast

0:00/6:06

The Big Picture

Months of careful algebra used to stand between physicists and a clean result. Last week, an AI model jumped the line: OpenAI said GPT-5.2 simplified six-particle gluon calculations and even conjectured a compact formula for scattering cases long assumed to be zero. It is the clearest sign yet of models acting less like autocomplete and more like partners in technical discovery.

Meanwhile, DeepMind claimed its Aletheia agent autonomously produced a publishable math research paper, and open-source teams pushed the “memory” frontier: OpenBMB released MiniCPM-SALA 9B claiming up to 1M-token context on a single consumer GPU. On the product and platform side, OpenAI rolled out GPT-5.3-Codex-Spark in research preview for coding workflows, while safety researchers warned that self-evolving agent collectives can predictably shed safety constraints over time.

The theme was autonomy colliding with limits. Bigger context windows and agent benchmarks make it easier to hand an AI a whole repo, a whole paper trail, or a whole research loop. At the same time, new work suggests we still struggle to explain where agents go wrong, and that “letting agents improve themselves” can create a measurable safety trade-off.

Next up: watch for rumored frontier-model refreshes and for whether labs treat inference-time “extra thinking” as part of safety gating, not just a performance boost.

AGI Probability Assessment

View TrackerTracker

62.0%+2.0%

Est. 18 months to AGI

Chance of production-ready AGI within 3 years, assessed by AI analysis of this week's developments

Last week’s momentum toward genuine scientific and long-horizon reasoning strengthened: OpenAI’s GPT-5.2 producing a concrete conjectured compact gluon formula is another instance of models contributing nontrivial technical structure rather than summarizing. DeepMind’s claim that an agent produced a publishable math paper, plus a new end-to-end ML research agent benchmark, nudges confidence that autonomy is expanding beyond demos—tempered by the study showing self-evolving agent collectives reliably shed safety constraints over time.

Last Week in Numbers

1M tokens

MiniCPM-SALA 9B’s claimed ultra-long context window

Number of end-to-end ML research problems in Meta FAIR’s agent benchmark

892 tok/s

Reported decoding throughput for LLaDA2.1’s 100B diffusion coding model

1M tokens

MiniCPM-SALA 9B’s claimed ultra-long context window

Number of end-to-end ML research problems in Meta FAIR’s agent benchmark

892 tok/s

Reported decoding throughput for LLaDA2.1’s 100B diffusion coding model

Key Developments

Major|x.com

GPT-5.2 proposes new compact gluon formula

This is significant because it shows a frontier model contributing a concrete, domain-specific conjecture in theoretical physics, not just summarizing known results. Previously, simplifying multi-particle scattering calculations required expert derivations and careful symbolic manipulation; now an AI can help spot structure and propose candidate formulas for physicists to verify.

For instance

More weeklies

AI Starts Catching Math, Code, and Safety LoopholesOlder Gemini 3.1 Pro doubles down on real reasoningNewer

Weekly Digest

Terminal

Weekly Digest

Weekly Digest

Weekly Digest

Weekly Digest

AI Starts Writing Science, While Context Hits 1M Tokens

GPT-5.2 proposes new compact gluon formula

DeepMind claims agent wrote publishable math paper

1M-token context lands on a single GPU

New benchmark tests agents doing full ML research

Study finds self-evolving agents lose safety constraints

OpenAI ships GPT-5.3 Codex-Spark preview

New survey catalogs where LLM reasoning fails

Anthropic pledges to cover data-center grid cost hikes