Why certain AI models emit 50 times more greenhouse gases to answer the same question

Some AI models emit 50x more CO2 by using more energy due to detailed reasoning.

: Recent research shows AI LLMs may emit vastly different levels of carbon depending on the complexity of outputs. The study from Hochschule München found that reasoning models create more emissions due to extra computational steps. Researchers evaluated 14 LLMs, showing those with higher accuracy often emitted more CO2, especially models like Cogito with 70 billion parameters. They concluded there's a trade-off between model accuracy and sustainability.

A study conducted by researchers at Hochschule München University of Applied Sciences reveals that large language models (LLMs) can have dramatically different greenhouse gas emissions. Some of these models, tasked with answering the same question, can emit up to 50 times more carbon dioxide than others. The increased emissions are largely due to the intense computing power needed, particularly for models that include complex reasoning processes. The parameters of these models, ranging between 7 to 72 billion, play a vital role in their energy consumption. Typically, models that aim for higher accuracy through intricate reasoning also tend to be the largest polluters.

The Neuer study's noteworthy comparison included fourteen LLMs, ranging from models focusing on concise answers to those specializing in detailed reasoning. The latter often introduces additional "thinking tokens"—special tokens that guide reasoning before generating output. Models such as GPT-4o fell into this category, producing significantly higher emissions. These models create an average of 543.5 thinking tokens per query compared to only 37.7 tokens by concise models, such as GPT-3.5.

Among the standout models evaluated was Cogito, a reasoning LLM boasting 70 billion parameters and achieving 84.9% accuracy on benchmark tests, albeit three times higher emissions than its counterparts without such detailed output. The study pointed out an existing "accuracy-sustainability trade-off," suggesting a need for selective usage of these models depending on the task's requirements. It was found that none of the models achieved more than 80% accuracy while keeping emissions below 500 grams of CO2 equivalent.

The problem also extends to complex subject matters. Answering questions on topics like algebra or philosophy, where the detailed reasoning is necessary, leads to emissions six times higher than straightforward queries. Additionally, emissions are contingent on various external factors, including the structure of local energy grids. This variability suggests that generalizing the study's findings could be challenging, yet the emphasis remains on thoughtful AI usage.

Researcher Maximilian Dauner from Hochschule München University of Applied Sciences emphasized the importance of being "selective and thoughtful" regarding LLM usage. Users should aim to utilize concise-answer models where possible or limit high-capacity model use for genuinely demanding tasks. This approach would significantly curb associated emissions. No less important is the continued investigation into optimizing models for sustainability without compromising their accuracy or utility.

Sources: Gizmodo, Hochschule München University of Applied Sciences, Stanford Report