humanity's last exam - Search News

14d

A.I. May Pass 'Humanity’s Last Exam' Within the Next 9 Months, Scientists Say

Humanity’s Last Exam is the ultimate academic test for AI, which challenges the tech to answer the most difficult questions ...

Google Unveils Gemini 2.5 Pro, Shattering Records on Humanity’s Last Exam

Google has finally released Gemini 2.5 Pro, a larger reasoning model that has achieved 18.8% on Humanity's Last Exam without ...

Hosted on MSN1mon

OpenAI's deep research can complete 26% of ‘Humanity’s Last Exam': What is it and what does it mean?

Humanity's Last Exam is a recently released exam for AI models, also called large language models, like ChatGPT, Grok-2 and deep research. It is used to judge the performance of the AI model ...

With AI models clobbering every benchmark, it's time for human evaluation

Human oversight of AI development has been a staple of progress in Gen AI. The development of ChatGPT in 2022 made extensive ...

Hosted on MSN29d

OpenAI’s deep research can complete 26% of Humanity’s Last Exam—a benchmark for the frontier of human knowledge

OpenAI’s new autonomous agent, deep research, has stormed past competing models and set a new standard on Humanity’s Last Exam, a global benchmark created to determine when AI can answer ...

Google’s Gemini 2.5 Pro AI Model Launched; Tops Leaderboard, Outperforms OpenAI’s o3 Mini

Gemini 2.5 Pro is also claimed to have outperformed models like OpenAI's o3-mini, Grok 3 Beta, Claude 3.7 Sonnet, and DeepSeek R1 in several benchmarks, such as GPQA Diamond, AIME 2024 and 2025, Aider ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results