Google and OpenAI chatbots win gold at international math olympiad

Google and OpenAI chatbots impress by sharing gold at the International Math Olympiad, outperforming many students.

: Artificial intelligence models developed by Google’s DeepMind and OpenAI achieved gold medal performance at the International Math Olympiad. Both successfully solved five of the six problems presented, scoring 35 out of 42 points, rivaling top students globally. Google's AI was formally invited and participated under genuine rules, while OpenAI solved the problems independently after they were made public. Publicly accessible AI models performed much worse, indicating a gap between public and private AI capabilities.

The International Math Olympiad (IMO) is a prestigious global competition where exceptionally talented high school students tackle challenging mathematical problems over two days. This year, Google's DeepMind and OpenAI made significant waves by reaching a gold medal level of performance. Both AI models managed to solve five out of six problems perfectly during the exams, achieving a score of 35 out of a possible 42, mirroring the capabilities of some of the brightest young human mathematicians in the world.

These AI models were subjected to the same constraints as human participants — they were given two four-and-a-half-hour sessions without access to external resources such as the internet. Despite these limitations, Google's DeepMind attended the event formally, competing under typical conditions for participants. In contrast, OpenAI chose to solve the problems independently after the IMO made them public, simulating the exam environment to demonstrate its model's capabilities.

OpenAI’s decision to announce its results before the official scoreboard was tallied drew some criticism, as it distracted from the accomplishments of the student participants. However, the end message was clear: AI's proficiency in solving complex reasoning problems is advancing rapidly. Notably, the AI models used are not publicly available, indicating a significant capability gap between experimental and publicly released AI tools.

Publicly accessible AI, including models like Gemini 2.5 Pro, Grok-4, and OpenAI o4, managed considerably lower scores, failing to exceed 13 points — a performance far below the threshold for even a bronze medal. This has raised discussions in the community regarding the accessibility of advanced AI tools and whether such powerful technologies should be more widely available for broader benefits. Redistributions of these specialized models could potentially enhance AI’s application fields substantially if more widely released.

The achievement sparks renewed discussions on the future of AI in education and its potential to support learning and problem-solving at complex levels. The performance gap between in-lab models and those available to the public evidences a significant area for growth and accessibility. This raises questions about whether and how these tools could become pillars in educational technology, as well as fairness in global competition settings in the age of AI.

Sources: Gizmodo, DeepMind Blog, Implicator.ai