EAGLET boosts AI agent performance on longer-horizon tasks by generating custom plans



2025 was supposed to be the year of "AI agents," according to Nvidia CEO Jensen Huang, and other AI industry personnel. And it has been, in many ways, with numerous leading AI model providers such as OpenAI, Google, and even Chinese competitors like Alibaba releasing fine-tuned AI models or applications designed to focus on a narrow set of tasks, such as web search and report writing.

But one big hurdle to a future of highly performant, reliable, AI agents remains: getting them to stay on task when the task extends over a number of steps. Third-party benchmark tests show even the most powerful AI models experience higher failure rates the more steps they take to complete a task, and the longer time they spend on it (exceeding hours).

A new academic framework called EAGLET proposes a practical and efficient method to improve long-horizon task performance in LLM-based agents — without the need for manual data labeling or retraining.

Developed by researchers from Tsinghua University, Peking University, DeepLang AI, and the University of Illinois Urbana-Champaign, EAGLET offers a "global planner" that can be integrated into existing agent workflows to reduce hallucinations and improve task efficiency.

EAGLET is a fine-tuned language model that interprets task instructions — typically provided as prompts by the user or the agent's operating environment — and generates a high-level plan for the agent (powered by its own LLM). It does not intervene during execution, but its up-front guidance helps reduce planning errors and improve task completion rates.

Addressing the Planning Problem in Long-Horizon Agents

Many LLM-based agents struggle with long-horizon tasks because they rely on reactive, step-by-step reasoning. This approach often leads to trial-and-error behavior, planning hallucinations, and inefficient trajectories.

EAGLET tackles this limitation by introducing a global planning module that works alongside the executor agent.

Instead of blending planning and action generation in a single model, EAGLET separates them, enabling more coherent, task-level strategies.

A Two-Stage Training Pipeline with No Human Annotations

EAGLET’s planner is trained using a two-stage process that requires no human-written plans or annotations.

The first stage involves generating synthetic plans with high-capability LLMs, such as GPT-5 and DeepSeek-V3.1-Think.

These plans are then filtered using a novel strategy called homologous consensus filtering, which retains only those that improve task performance for both expert and novice executor agents.

In the second stage, a rule-based reinforcement learning process further refines the planner, using a custom-designed reward function to assess how much each plan helps multiple agents succeed.

Introducing the Executor Capability Gain Reward (ECGR)

One of EAGLET’s key innovations is the Executor Capability Gain Reward (ECGR).

This reward measures the value of a generated plan by checking whether it helps both high- and low-capability agents complete tasks more successfully and with fewer steps.

It also includes a decay factor to favor shorter, more efficient task trajectories. This approach avoids over-rewarding plans that are only useful to already-competent agents and promotes more generalizable planning guidance.

Compatible with Existing Agents and Models

The EAGLET planner is designed to be modular and "plug-and-play," meaning it can be inserted into existing agent pipelines without requiring executor retraining.

In evaluations, the planner boosted performance across a variety of foundational models, including GPT-4.1, GPT-5, Llama-3.1, and Qwen2.5.

It also proved effective regardless of prompting strategy, working well with standard ReAct-style prompts as well as approaches like Reflexion.

State-of-the-Art Performance Across Benchmarks

EAGLET was tested on three widely used benchmarks for long-horizon agent tasks: ScienceWorld, which simulates scientific experiments in a text-based lab environment; ALFWorld, which tasks agents with completing household activities through natural language in a simulated home setting; and WebShop, which evaluates goal-driven behavior in a realistic online shopping interface.

Across all three, executor agents equipped with EAGLET outperformed their non-planning counterparts and other planning baselines, including MPO and KnowAgent.

In experiments with the open source Llama-3.1-8B-Instruct model, EAGLET boosted average performance from 39.5 to 59.4, a +19.9 point gain across tasks.

On ScienceWorld unseen scenarios, it raised performance from 42.2 to 61.6.

In ALFWorld seen scenarios, EAGLET improved outcomes from 22.9 to 54.3, a more than 2.3× increase in performance.

Even stronger gains were seen with more capable models.

For instance, GPT-4.1 improved from 75.5 to 82.2 average score with EAGLET, and GPT-5 rose from 84.5 to 88.1, despite already being strong performers.

In some benchmarks, performance gains were as high as +11.8 points, such as when combining EAGLET with the ETO executor method on ALFWorld unseen tasks.

Compared to other planning baselines like MPO, EAGLET consistently delivered higher task completion rates. For example, on ALFWorld unseen tasks with GPT-4.1, MPO achieved 79.1, while EAGLET scored 83.6—a +4.5 point advantage.

Additionally, the paper reports that agents using EAGLET complete tasks in fewer steps on average. With GPT-4.1 as executor, average step count dropped from 13.0 (no planner) to 11.1 (EAGLET). With GPT-5, it dropped from 11.4 to 9.4, supporting the claim of improved execution efficiency.

Efficiency Gains in Training and Execution

Compared to RL-based methods like GiGPO, which can require hundreds of training iterations, EAGLET achieved better or comparable results with roughly one-eighth the training effort.

This efficiency also carries over into execution: agents using EAGLET typically needed fewer steps to complete tasks. This translates into reduced inference time and compute cost in production scenarios.

No Public Code—Yet

As of the version submitted to arXiv, the authors have not released an open-source implementation of EAGLET. It is unclear if or when the code will be released, under what license, or how it will be maintained, which may limit the near-term utility of the framework for enterprise deployment.

VentureBeat has reached out to the authors to clarify these points and will update this piece when we hear back.

Enterprise Deployment Questions Remain

While the planner is described as plug-and-play, it remains unclear whether EAGLET can be easily integrated into popular enterprise agent frameworks such as LangChain or AutoGen, or if it requires a custom stack to support plan-execute separation.

Similarly, the training setup leverages multiple executor agents, which may be difficult to replicate in enterprise environments with limited model access. VentureBeat has asked the researchers whether the homologous consensus filtering method can be adapted for teams that only have access to one executor model or limited compute resources.

EAGLET’s authors report success across model types and sizes, but it is not yet known what the minimal viable model scale is for practical deployment. For example, can enterprise teams use the planner effectively with sub-10B parameter open models in latency-sensitive environments? Additionally, the framework may offer industry-specific value in domains like customer support or IT automation, but it remains to be seen how easily the planner can be fine-tuned or customized for such verticals.

Real-Time vs. Pre-Generated Planning

Another open question is how EAGLET is best deployed in practice. Should the planner operate in real-time alongside executors within a loop, or is it better used offline to pre-generate global plans for known task types? Each approach has implications for latency, cost, and operational complexity. VentureBeat has posed this question to the authors and will report any insights that emerge.

Strategic Tradeoffs for Enterprise Teams

For technical leaders at medium-to-large enterprises, EAGLET represents a compelling proof of concept for improving the reliability and efficiency of LLM agents. But without public tooling or implementation guidelines, the framework still presents a build-versus-wait decision. Enterprises must weigh the potential gains in task performance and efficiency against the costs of reproducing or approximating the training process in-house.

Potential Use Cases in Enterprise Settings

For enterprises developing agentic AI systems—especially in environments requiring stepwise planning, such as IT automation, customer support, or online interactions—EAGLET offers a template for how to incorporate planning without retraining. Its ability to guide both open- and closed-source models, along with its efficient training method, may make it an appealing starting point for teams seeking to improve agent performance with minimal overhead.



Source link

Share

Latest Updates

Frequently Asked Questions

Related Articles

Dell Pro Max with GB10 brings data center AI power to the desktop

As AI models get bigger and more complex, one of the biggest hurdles...

New IT rules explained: Deepfakes must be labelled, takedowns only by senior officials

In a bid to tackle deepfakes and artificially created content, the IT ministry...

Access Denied

Access Denied You don't have permission to access "http://www.gadgets360.com/wearables/news/garmin-d2-air-x15-d2-mach-2-price-launch-availability-features-9502696" on this server. Reference #18.79cfdb17.1761216823.358f5e90 https://errors.edgesuite.net/18.79cfdb17.1761216823.358f5e90 Source...
custom cakes home inspections business brokerage life counseling rehab center residences chiropractic clinic surf school merchant advisors poker room med spa facility services creative academy tea shop life coach restaurant life insurance fitness program electrician NDIS provider medical academy Judi Bola Sabung Ayam Online Mahjong Ways Judi Bola Sabung Ayam Online Mahjong Ways Judi Bola SABUNG AYAM ONLINE Judi Bola Live Casino Sabung Ayam Online Judi Bola Judi Bola sabung ayam online judi bola judi bola judi bola judi bola Slot Mahjong slot mahjong Slot Mahjong judi bola sabung ayam online mahjong ways mahjong ways mahjong ways judi bola SV388 SABUNG AYAM ONLINE GA28 judi bola online sabung ayam online live casino online live casino online SV388 SV388 SV388 SV388 SV388 Mix parlay sabung ayam online SV388 SBOBET88 judi bola judi bola judi bola Reset Pola Blackjack Jadi Kasus Study Mahjong Ways Mahjong Ways Mahjong Ways Mahjong Ways sabung ayam online sabung ayam online judi bola sabung ayam online judi bola Judi Bola Sabung Ayam Online Live Casino Online Sabung Ayam Online Sabung Ayam Online Sabung Ayam Online Sabung Ayam Online Sabung Ayam Online Sabung Ayam Online sabung ayam online judi bola mahjong ways sabung ayam online judi bola mahjong ways mahjong ways sabung ayam online sv388 Sv388 judi bola judi bola judi bola JUARA303 Mahjong ways Judi Bola Judi Bola Sabung Ayam Online Live casino mahjong ways 2 sabung ayam online sabung ayam online mahjong ways mahjong ways mahjong ways SV388 SBOBET88 judi bola judi bola judi bola judi bola judi bola https://himakom.fisip.ulm.ac.id/ SABUNG AYAM ONLINE MIX PARLAY SLOT GACOR judi bola online sabung ayam online LIVE CASINO ONLINE Judi Bola Online SABUNG AYAM ONLINE JUDI BOLA ONLINE LIVE CASINO ONLINE JUDI BOLA ONLINE LIVE CASINO ONLINE LIVE CASINO ONLINE sabung ayam online Portal SV388 SBOBET88 SABUNG AYAM ONLINE JUDI BOLA ONLINE CASINO ONLINE MAHJONG WAYS 2 sabung ayam online judi bola SABUNG AYAM ONLINE JUDI BOLA ONLINE Sabung Ayam Online JUDI BOLA Sabung Ayam Online JUDI BOLA SV388, WS168 & GA28 SBOBET88 SV388, WS168 & GA28 SBOBET88 SBOBET88 CASINO ONLINE SLOT GACOR Sabung Ayam Online judi bola