OpenAI Releases Upgraded Realtime API Audio Model Snapshots
OpenAI launched gpt-4o-mini-transcribe-2025-12-15 (89% fewer hallucinations), gpt-4o-mini-tts-2025-12-15 (35% fewer errors), gpt-realtime-mini-2025-12-15 (22% better instructions, 13% better function calling). Boosts reliability for low-latency speech-to-speech agents. Advances production multimodal conversational AI.
NVIDIA Launches Fully Open Nemotron 3 Nano 30B-A3B MoE Model
NVIDIA released Nemotron 3 Nano, fully open hybrid Mamba-Transformer MoE with 1M context, NVFP4 pre-training on 1T tokens, latent MoE, and NeMo Gym RL. Tops open models in math/coding/agentic benchmarks (#47 LMSYS Arena). Enables efficient, scalable multi-agent reasoning without compute scaling.
Motif Releases Report on Advanced Synthetic Data Pipelines for Reasoning
Motif's arXiv paper (2512.11463) outlines synthetic pipelines matching SYNTH via data curation, verification, and curriculum learning, boosting reasoning quality 15%. Advances post-training efficiency for small models beyond distillation. Key for scalable SOTA pursuit.
Mistral Launches Devstral 2 Coding Model Family
Mistral's open-source Devstral 2 (123B/24B params) claims SOTA coding performance, paired with Vibe CLI for automation. This democratizes advanced developer tools, boosting agentic coding innovation.
ARC Benchmarks Update on Fluid Intelligence
François Chollet reports rapid progress in fluid intelligence for AGI, with ARC-1 saturating, ARC-2 unsaturated, and ARC-3 planned for 2026. This highlights gaps in efficiency and exploration, driving targeted research.
ServiceNow and Together AI Release Apriel-1.6-15B-Thinker
ServiceNow and Together AI launch Apriel-1.6-15B-Thinker, a 15B multimodal model rivaling 235B on reasoning tasks. Runs on a single GPU, MIT-licensed. Democratizes efficient inference for AGI research.
Google DeepMind Launches FACTS Benchmark with Gemini 3 Pro
Google DeepMind releases FACTS Benchmark for factuality evaluation, with Gemini 3 Pro scoring 68.8%. Public on Kaggle, it highlights multimodal gaps. Key for improving reliable reasoning toward AGI trustworthiness.
Anthropic Debuts Selective Gradient Masking for Safety
Anthropic introduces SGTM, a pretraining method isolating high-risk knowledge into removable parameters. It outperforms traditional safety techniques, 7x harder to reverse. A critical advance for safer AGI deployment.
OpenAI Unveils GPT-5.2 with Expert-Level Performance
OpenAI releases GPT-5.2 in Instant, Thinking, and Pro variants, achieving human expert performance on GDPval across 44 occupations. It excels in long-context reasoning and professional tools. A major step toward AGI productivity.
Google Announces Gemini Updates with Translation, Speech, and Research Capabilities
Google updated Gemini models, adding translation across ≈ 20 languages, a speech-to-speech beta, and the Gemini Deep Research agent API. New DeepSearchQA and FACTS benchmarks boost reasoning and evaluation for reliable AI scaling.