TTT models might be the next frontier in generative AI
Researchers are exploring Test-Time Training (TTT) models, which may surpass transformers in efficiency and computational power.
Transformers, foundational to models like OpenAI's Sora and GPT-4o, are encountering computational roadblocks, especially in handling vast data efficiently. Researchers from Stanford, UC San Diego, UC Berkeley, and Meta propose Test-Time Training (TTT) as a promising new architecture. TTT models replace the transformer's expanding hidden state with a constant-size machine learning model, leading to significant computational efficiency.
Yu Sun, a post-doc at Stanford, explains that the TTT model processes data into representative variables called weights, unlike the ever-growing lookup table of transformers. This allows TTT models to theoretically handle billions of data pieces more efficiently. Despite their promise, TTT models are not yet a direct replacement for transformers, with only small models developed for study so far.
AI experts like Mike Cook see potential in TTT but urge caution, given the early stage of research. Other AI firms, like Mistral with its Codestral Mamba, are exploring alternatives like state space models (SSMs) for similar efficiency boosts. These efforts signify an industry-wide push toward more accessible and widespread generative AI technologies.