What’s Holding Back AI Agents? Key Takeaways from an Agentic AI Summit

By Raja Kumar . Published: Aug 8, 2025

I recently attended a summit focused on AI agents, featuring speakers from academia and industry including Ion Stoica, Matei Zaharia, Dawn Song, and others. While the hype around agents is real, the discussions were refreshingly honest about what’s not working yet—and what might unlock the next phase.

1) Why Agents Haven’t Taken Off (Yet)

One of the most useful frameworks discussed was a failure taxonomy for multi-agent systems. The main bottlenecks are surprisingly practical:

Ion Stoica: “Deep research is great at using search, okay at connectors, and not good at custom tools.”

This gap between reasoning and real-world action is one of the biggest friction points today.

2) Human Preference and User Design Matter More Than Ever

Human alignment isn’t just about safety—it’s about usability. Tools like LLM Arena show how human preference evaluation is becoming central. A recurring theme was that user design matters as much as model quality.

Agents can’t just be “ChatGPT with tools.” New interfaces are needed—ones that help users teach agents tasks naturally, understand failures, and build trust over time.

3) Teaching Agents New Tasks Is Still Inefficient

Matei Zaharia highlighted a core research challenge: sample efficiency.

Approaches like DSPy (and work such as GEPA) aim to reduce this inefficiency by structuring how agents learn and reason, rather than brute-forcing with RL.

4) Safety, Security, and Long-Term Thinking

Dawn Song emphasized that agentic AI raises new safety and security challenges. As agents become more autonomous, mistakes become more costly—reinforcing the need for agents that are interpretable, constrained, and secure by default.

A related panel insight was the idea of long-term investment: agentic AI is not a 1–2 year bet; it’s closer to a 10–15 year horizon.

5) A Simple but Powerful AGI Litmus Test

One of the most memorable ideas was the “Grandma Test for AGI”:

If a grandma explains a task to a robot one time, and the robot does it correctly every time in the future without mistakes, it passes. Otherwise, it fails.

By this standard, today’s agents clearly fall short—but it’s a useful north star for generalization and reliability.

6) Scaling Isn’t Just for the Biggest Models

An encouraging note: improvements in large models are cascading down to medium-sized models. Scaling benefits aren’t exclusive to frontier models, which could make capable agents more accessible over time.

Final Takeaway

The agent problem is no longer just about better models. Progress depends on better tooling and connectors, more efficient learning methods, thoughtful user interfaces, and strong safety foundations. Agents will take off not when they reason better in isolation, but when they reliably act in the messy real world.