Software engineer on the true state of AI agents (they're not there yet)
AI agents aren't fully autonomous due to mathematical and design constraints, emphasizing human oversight.

In a candid assessment, systems engineer Utkarsh Kanwat dispels myths surrounding the readiness of AI agents for full autonomy. Drawing from extensive experience in working with production-level systems across various domains, he explains fundamental mathematical constraints that undermine the feasibility of fully autonomous, multi-step agent workflows. He illustrates that current Large Language Models (LLMs) deliver only about 95% reliability per step, which results in drastically reduced success rates when tasks require numerous steps. For a 20-step workflow, the success rate falls to a mere 36%, exposing the intrinsic limitations of chaining too many tasks autonomously.
To mitigate these issues, Kanwat recommends breaking workflows into smaller, manageable units, each encompassing no more than 3 to 5 steps. Such design ensures reliability by incorporating explicit rollback points and human confirmation checkpoints, effectively handling errors arising from the compounded error rate of extensive autonomous sequences. This methodology relies on bounded contexts and atomic operations, which allow AI systems to function effectively without risking compounded errors due to excessive autonomy.
Conversational agent designs face economic barriers due to token cost scaling issues. Kanwat's prototyping experiences illustrate that every additional interaction in a conversation exponentially increases token costs, rendering the widespread use economically unsustainable. A 100-turn dialogue can incur costs ranging from $50 to $100, an impractical expense for scalable systems. To counteract this, Kanwat employed a stateless agent model that provides outputs from precise inputs, sidestepping the need for maintaining conversational context and thereby eliminating runaway costs.
Effective tool design remains a significant engineering hurdle, often unrecognized amidst the hype surrounding AI agents' capabilities. Kanwat emphasizes that designing tools to provide useful feedback to AI systems without overwhelming their limited context is crucial. A well-crafted database tool, for example, summarizes outputs effectively, only presenting essential results to the AI agent. This prevents AI systems from being inundated with irrelevant data, highlighting the importance of structured communication in successful AI agent deployment.
Legacy systems integration poses another challenge to AI agent deployment, compounded by the lack of clean APIs, fluctuating rate limits, and strict compliance protocols. Kanwat describes engineering around these issues, employing traditional practices such as connection pooling, transaction rollbacks, and timeout management, underscoring that these elements often fall outside AI agent management scopes. He suggests that rather than prioritizing autonomous systems, companies should focus on integration and maintaining human oversight to prevent failures due to compounded errors or unpractical economic costs.
Kanwat foresees difficulties ahead for both venture-backed startups and enterprises aiming to integrate AI agents due to economic and legacy system challenges. He asserts that specializing in domain-focused tools that supplement human oversight will likely produce the most reliable products. His insights into the need for coherent interaction between current systems and evolving AI capabilities paint a realistic picture of where AI stands today: full autonomy is still on the horizon, with integration as the current focal point.
Sources: Utkarsh Kanwat's Blog, TechSpot