OpenAI introduces a general-purpose agent in ChatGPT

OpenAI's new agent in ChatGPT can autonomously handle tasks like coding, calendar navigation, and slide creation.

: OpenAI introduced a new general-purpose agent within ChatGPT which can seamlessly perform a range of user-centered tasks, such as managing calendars, creating editable presentations, and executing code autonomously. This ChatGPT agent blends features from previous projects like 'Operator' and 'Deep Research,' enabling it to interact with different websites and compile concise research reports. The tool is accessible to subscribers of OpenAI's Pro, Plus, and Team plans, providing them a robust virtual assistant. While the agent showcases advanced capabilities with a 41.6% score on Humanity’s Last Exam and a 27.4% score on FrontierMath, OpenAI emphasizes safeguarding, considering its potential risks in sensitive domains.

OpenAI has unveiled an innovative agentic system embedded within ChatGPT, dubbed the ChatGPT agent, designed to manage an array of user-oriented tasks autonomously. The creation amalgamates features from OpenAI's previous tools, notably Operator's interactive website capabilities and Deep Research's ability to synthesize information from the web into cohesive reports. The utility of this new agent extends to managing everyday tasks like calendar scheduling, crafting slide presentations, and executing code, making it a potent ally for users aiming to streamline their workflow.

While the potential of this system is immense, extending the functionalities of AI beyond simple information retrieval and conversational exchanges presents inherent risks. OpenAI has addressed these risks by integrating robust safety measures within the ChatGPT agent. These measures include real-time monitoring and a classifier that identifies requests pertaining to biology and prevents the generation of content that might pose biological threats. The removal of the memory feature in this agent further mitigates the risk of sensitive data misuse via prompt injection attacks.

The model's performance has been detailed with remarkable accuracy in several challenging benchmarks. On the Humanity’s Last Exam, it achieved a score of 41.6%, approximately double the score achieved by prior models o3 and o4-mini. Additionally, FrontierMath saw a score of 27.4% when tools were accessible, manifesting a significant enhancement over the previous leading score of 6.3% by o4-mini. These metrics underscore the heightened capabilities of the ChatGPT agent, enabling it to tackle demanding tasks with improved precision.

To access the ChatGPT agent, subscribers to OpenAI's Pro, Plus, and Team plans can activate 'agent mode' through ChatGPT's tool menu, unlocking seamless interaction through natural language prompts. OpenAI suggests practical applications of the agent, such as planning meals or conducting comprehensive competitive analyses. Despite its promising prospects, questions remain regarding its practical efficacy in real-world applications, given the past brittleness of agent technology. Nevertheless, OpenAI remains committed to overcoming these hurdles and promises continued advancements in AI agent capabilities.

Sources: OpenAI, TechCrunch