Why AI can’t spell ‘strawberry’

Large language models fail to spell simple words due to their transformer architecture. Image generators like Midjourney use different methods.

: Large language models like GPT-4 and Claude often fail at simple tasks such as counting letters due to their transformer architecture. Transformers break text into tokens and encode them, which does not involve direct understanding of individual letters. Image generators like Midjourney and DALL-E use diffusion models to create images from noise, and they face issues with small details like fingers. However, techniques to improve these models are evolving, with OpenAI and Google DeepMind making strides in enhancing AI capabilities.

Large language models such as GPT-4 and Claude often struggle with fundamental tasks like spelling simple words due to their reliance on transformer architecture. Transformers break text into numerical tokens and then decode them, which means they don't directly understand the exact letters in a word, leading to errors in tasks like counting specific letters in the word 'strawberry.'

Image generators like Midjourney and DALL-E, on the other hand, utilize diffusion models which reconstruct images from noise. Although these models perform well on larger elements like cars and faces, they often falter with finer details such as fingers due to the lack of prominence in their training data. Some improvements have been made by training these models on more images of the problematic features.

Efforts to enhance AI models are ongoing, with initiatives like OpenAI's code-named Strawberry aimed at generating synthetic data to improve model accuracy. Google DeepMind's new systems, AlphaProof and AlphaGeometry 2, have shown strong performance in formal mathematical reasoning, indicating progress in complex problem-solving. Despite the humorous failures, these advanced AI systems continue to push the boundaries of what's possible.