🚀 Check out the latest benchmark results from Artificial Analysis!
- Grok 4 is leading the pack with an AI index of 73, beating out OpenAI o3 (70), Google Gemini 2.5 Pro (70), Anthropic Claude 4 Opus (64), and DeepSeek R1 0528 (68). 🥇
- Price-wise, Grok 4 matches Grok 3 at $3.15 per million input/output tokens ($0.75 for cached input). That’s on par with Claude 4 Sonnet, but pricier than Gemini 2.5 Pro ($1.25 for <200k tokens) and o3 ($2 after their recent price drop). 💸
- Grok isn't just winning in AI; it tops programming and math indexes too! 📊📚
- It hit a record GPQA Diamond score of 88%, surpassing Gemini's previous high of 84%! 🌟
- In Humanity's Last Exam, it scored 24%, beating Gemini's prior record of 21%. Just a reminder: our benchmarks use a dataset from January 2025 without any tools. 🧠📝
- Tied for top scores in MMLU-Pro (87%) and AIME 2024 (94%). 🎉
- Token output speed is at 75 tokens/sec—slower than o3 (188), Gemini (142), and Claude Sonnet Thinking (85), but faster than Claude Opus Thinking (66). ⚡️
- Context window? A solid 256k tokens—less than Gemini’s million, but still better than Claude versions and R1 (all at 200k or below). 🪄
- Supports text & image input for now; audio isn’t in the mix yet. 🔊❌
- Function calls and structured output? You bet! 📞✨
#AI #BenchmarkResults #Grok4