OpenAI’s o3 suggests AI models are scaling in new ways — but so are the costs
OpenAI's o3 model excels in performance but with high costs.
OpenAI’s o3 model has shown substantial advancements in AI capability, particularly by achieving an 88% score on the ARC-AGI benchmark, outperforming all other AI models that have attempted the test. Yet, this success is tempered by the considerable costs associated with its usage, since the model employs test-time scaling which requires a significant amount of computation, costing over $1,000 per task.
The test-time scaling method involves utilizing more computational resources during the AI’s inference phase, making it a potent tool for extracting improved performance from AI models. However, this method results in unpredictable usage costs, creating financial barriers to widespread application and raising questions about how OpenAI will price the o3 model.
Despite these impressive feats, the high compute and cost requirements of o3 limit its use to high-stakes scenarios where the expense is justified, such as in finance or industrial applications. Additionally, while o3 marks progress in AI performance, it still exhibits shortcomings common to large language models, such as the tendency to produce inaccurate responses.