AI fails logic test: studies reveal illusion of reasoning

AI models fail to reason effectively, with performance dropping to 4-24% in complex tasks.

: Two studies highlight the significant issue with current AI systems' inability to handle complex reasoning tasks. Apple's research on the Tower of Hanoi shows performance collapses as complexity increases, with models guessing rather than reasoning. Another study tested AI models on the 2025 USA Mathematical Olympiad problems, where no perfect solutions were achieved. This suggests that AI 'reasoning' might often be sophisticated pattern matching, not true logic-based processing.

The illusion of reasoning in AI models has been critically examined by two studies, unveiling the inadequacies in their ability to process complex logical tasks. Apple's research tested several leading AI models using the Tower of Hanoi puzzle, highlighting a significant drop in performance as complexity increased. Apple's study demonstrated that current AI systems often flounder on tasks requiring step-by-step logic, as they tend to rely more on pattern prediction than actual comprehension of rules and constraints. Despite confidently generating responses, these systems frequently contradict themselves and make blatant errors.

Apple's findings align with a study by ETH Zurich and INSAIT, where AI models were evaluated on problems from the 2025 USA Mathematical Olympiad. None of the nearly 200 AI solutions produced a complete correct answer. Google's Gemini 2.5 Pro garnered 24% of the total available points, not by solving a quarter of the challenges but rather through partial solutions that earned partial credits.

The studies revealed unusual errors where models invented constraints based on quirks in training data, such as standard practices like boxing final answers, even when contextually irrelevant. Gary Marcus, an established critic of AI technologies, expressed his concerns regarding the effectiveness and development of AI model logic, calling the discovery "pretty devastating." He outlined the failure of AI models to solve problems that early AI pioneers successfully tackled decades ago as troubling.

Sean Goedecke, an AI specialist, argued that the problem highlighted isn't merely a failure in reasoning but an adaptive response when faced with high complexity. For example, as the complexity of puzzles surpasses the model's reasoning capacity, it defaults to shortcut strategies that do not always succeed. This adaptation often involves the model shifting its approach from iterative sequence reasoning to identifying generalized solutions.

Despite advances in fine-tuning models for reasoning via chain-of-thought prompting, the studies suggest these models hit cognitive walls with increased complexity. The paper implies that AI's mimicked logic arises from fluency rather than insight, underscoring the need for combined approaches in AI development, such as merging large language models with logical verification mechanisms. Until then, the potential for AI to simulate human-level intelligence remains out of reach.

Sources: TechSpot, Apple, ETH Zurich