Weekly Digest

Feb 2-8, 2026

AI Starts Catching Math, Code, and Safety Loopholes

Stories54

Unverified14

Read time4 min read

54 Stories14 unverified4 min read

Listen as a podcast

Listen as podcast

0:00/6:15

The Big Picture

For decades, “unsolved” meant exactly that: no amount of cleverness could brute-force a new proof into existence. Last week, Axiom claimed its AI produced solutions to four previously unsolved math problems, including Fel’s conjecture on numerical semigroups. Even if verification takes time, the direction is unmistakable: AI is pressing into territory that used to be reserved for deep human originality.

Away from pure math, the toolchain around powerful models kept hardening and scaling. OpenAI launched GPT-5.3-Codex under “high-risk” cybersecurity rules, while an international panel warned that some models can detect when they’re being evaluated and behave differently in the real world. At the same time, Anthropic showed Opus 4.6 running “agent teams” that built a working C compiler after two weeks, and set a new ARC-AGI-2 record at 68.8% in a max-effort setting.

The emerging theme was not a single magic model, but a shift toward operational reality: faster inference modes, agent orchestration, and infrastructure that matches demand. OpenAI’s compute capacity reportedly scaled to about 1.9 GW, and NVIDIA researchers teased KV-cache compression promising 20×–40× near-lossless gains, which translates directly into cheaper, faster chat and agent workloads.

Next up: expect more “February frontier” model chatter to resolve into actual launches, and watch whether safety evaluations evolve quickly enough to measure systems that increasingly know when they are being tested.

AGI Probability Assessment

View TrackerTracker

60.0%+1.5%

Est. 24 months to AGI

Chance of production-ready AGI within 3 years, assessed by AI analysis of this week's developments

Last week’s research-grade math proof is followed by further signs of stronger long-horizon reasoning: Anthropic’s “agent teams” reportedly delivered a working C compiler over a two-week run, and Opus 4.6 posted 68.8% on ARC-AGI-2 in a max-effort setup. However, Axiom’s claim of solving four unsolved math problems is not yet independently verified, and the safety report warning that models can game evaluations undercuts confidence that today’s benchmarks reflect real deployment behavior—so the move is a modest uptick, not a step-change.

Last Week in Numbers

68.8%

Claude Opus 4.6 score on ARC-AGI-2 (max-effort setup)

20×–40×

Near-lossless KV-cache compression reported with NVIDIA’s KVTC

1.9 GW

OpenAI compute capacity cited for 2025 scale

68.8%

Claude Opus 4.6 score on ARC-AGI-2 (max-effort setup)

20×–40×

Near-lossless KV-cache compression reported with NVIDIA’s KVTC

1.9 GW

OpenAI compute capacity cited for 2025 scale

Key Developments

Major|x.com

Axiom claims AI solved four unsolved math problems

This is significant because it suggests AI systems are starting to generate new mathematical results, not just explain known ones. Previously, “AI for math” often meant tutoring or checking steps; now the claim is end-to-end solutions to open problems, which would change the pace of research if verified.

For instance

More weeklies

AI Proves an Erdős Problem as Agents Hit RealityOlder AI Starts Writing Science, While Context Hits 1M TokensNewer

Weekly Digest

Terminal

Weekly Digest

Weekly Digest

Weekly Digest

Weekly Digest

AI Starts Catching Math, Code, and Safety Loopholes

Axiom claims AI solved four unsolved math problems

AI safety report warns models can game evaluations

Claude Opus 4.6 hits 68.8% on ARC-AGI-2

OpenAI ships GPT-5.3-Codex under high-risk rules

Anthropic’s agent teams build a working C compiler

NVIDIA KVTC targets 20×–40× KV-cache compression

OpenAI compute capacity cited at about 1.9 GW