XAI Grok 5 and AGI

08-25-2025 • https://www.nextbigfuture.com, by Brian Wang

Elon says Colossus 2 has a non-trivial chance of achieving AGI. He says xAI is close to having all the pieces in place for AGI. A non-trivial chance is probably about 1-5% .

AGI is loosely defined here as the point where debates rage: some argue it's achieved, others deny it.

What are the Essential Pieces for Achieving AGI

AI Models = xAI's Grok series, particularly Grok 4, is highlighted as a frontrunner on the LLM leaderboards.
Compute Colossus 2's gigawatt-scale cluster. xAI plans to scale from 200,000 H100 equivalents now to a 250x increase within 5 years. There could be up to 550,000 Nvidia B200-B300 GPUs.
Musk is aggressively securing power, including shipping entire power plants from overseas.

Evaluation of Grok 4: Strengths, Benchmarks, and Comparisons

Grok 4 and Grok 4 Heavy excels in complex, long-form tasks like coding at length and tackling difficult problems. It's described as approaching everything as if it's a hard problem which makes it slower for simple tasks but ideal for intricate projects. Community feedback and the speaker's testing support this.

Developer Denny Lamemensetta (possibly partnering with Max Herden) uses Grok 4 exclusively for game development, including UI via Grok Imagine, despite neither being coders. They employ "vibe coding" (intuitive, AI-assisted process), producing impressive results. The speaker plans an interview to explore why they prefer Grok over competitors like Gemini 2.5 Pro or GPT-5.

Benchmarks

#1 on Live Code Bench.
– Tops or near-tops in: AIME 2025 (100% with Python heavy), SWE-bench (75% score, edging out 74.9% runner-up), GPQA Diamond, Vending Bench (successful long-term vending machine operation).

– Outperforms on ARGI 2 (more complex than ARGI 1), showing "nonzero fluid intelligence" per Greg Kamradt (president of the ARC AGI Foundation Prize).

– Fluid vs. Crystallized Intelligence: LLMs traditionally rely on crystallized (experience-based) intelligence from training data. Grok 4 demonstrates fluid intelligence—adapting to novel problems without prior examples—peaking like human young adults' learning ability.