AWS now allows prompt caching with 90% cost reduction


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


The usage of AI continues to expand, and with more enterprises integrating AI tools into their workflows, many want to look for more options to cut the costs associated with running AI models. 

To answer customer demand, AWS announced two new capabilities on Bedrock to cut the cost of running AI models and applications, that are already available on competitor platforms. 

During a keynote speech at AWS re:Invent, Swami Sivasubramanian, vice president for  AI and Data at AWS, announced Intelligent Prompt Routing on Bedrock and the arrival of Prompt Caching. 

Intelligent Prompt Routing would help customers direct prompts to the best size so a big model doesn’t answer a simple query. 

“Developers need the right models for their applications, which is why we offer a wide set of models,” Sivasubramanian said. 

AWS said Intelligent Prompt Routing “can reduce costs by up to 30% without compromising on accuracy.” Users will have to choose a model family, and Bedrock’s Intelligent Prompt Routing will push prompts to the right-sized models within that family. 

Moving prompts through different models to optimize usage and cost has slowly gained prominence in the AI industry. Startup Not Diamond announced its smart routing feature in July. 

Voice agent company Argo Labs, an AWS customer, said it uses Intelligent Prompt Routing to ensure the correct-sized models handle the different customer inquiries. Simple yes-or-no questions like “Do you have a reservation?” are managed by a smaller model, but more complicated ones like “What vegan options are available?” would be routed to a bigger one. 

Caching prompts

AWS also announced Bedrock will now support prompt caching, where Bedrock can keep common or repeat prompts without pinging the model and generating another token. 

“Token generation costs can frequently rise particularly for repeat prompts,” Sivasubramanian said. “We wanted to give customers an easy way to dynamically cache prompts without sacrificing accuracy.”

AWS said prompt caching reduces costs “by up to 90% and latency by up to 85% for supported models.”

However, AWS is a little late to this trend. Prompt caching has been available on other platforms to help users cut costs when reusing prompts. Anthropic’s Claude 3.5 Sonnet and Haiku offer prompt caching on its API. OpenAI also expanded prompt caching for its API. 

Using AI models can be expensive

Running AI applications remains expensive, not just because of the cost of training models, but actually using them. Enterprises have said the costs of using AI are still one of the biggest barriers to broader deployment. 

As enterprises move towards agentic use cases, there is still a cost associated with users pinging the model and the agent to start doing its tasks. Methods like prompt caching and intelligent routing may help cut costs by limiting when a prompt pings a model API to answer a query. 

Model developers, though, said as adoption grows, some model prices could fall. OpenAI has said it anticipates AI costs could come down soon. 

More models

AWS, which hosts many models from Amazon — including its new Nova models — and leading open-source providers, will add new models on Bedrock. This includes models from Poolside, Stability AI’s Stable Diffusion 3.5  and Luma’s Ray 2. The models are expected to launch on Bedrock soon. 

Luma CEO and co-founder Amit Jain told VentureBeat that AWS is the first cloud provider partner of the company to host its models. Jain said the company used Amazon’s SageMaker HyperPod when building and training Luma models. 

“The AWS team had engineers who felt like part of our team because they were helping us figure out issues. It took us almost a week or two to bring our models to life,” Jain said. 



Source link

Share

Latest Updates

Frequently Asked Questions

Related Articles

Get this 15-inch HP Ryzen laptop with 16GB of RAM for nearly half off

Eight hundred bucks for a laptop with an older processor isn’t a great...

Zuckerberg Firing Hundreds of AI Developers After Hiring Spree

Mark Zuckerberg’s Meta is once again shaking up its artificial intelligence unit: as...

AI Models Get Brain Rot, Too

AI models may be a bit like humans, after all.A new study from...

Google claims first ‘verifiable’ quantum advantage for Willow chip

Google has claimed that its quantum processor Willow has achieved the first “verifiable”...
custom cakes home inspections business brokerage life counseling rehab center residences chiropractic clinic surf school merchant advisors poker room med spa facility services creative academy tea shop life coach restaurant life insurance fitness program electrician NDIS provider medical academy Judi Bola Sabung Ayam Online Mahjong Ways Judi Bola Sabung Ayam Online Mahjong Ways Judi Bola SABUNG AYAM ONLINE Judi Bola Live Casino Sabung Ayam Online Judi Bola Judi Bola sabung ayam online judi bola judi bola judi bola judi bola Slot Mahjong slot mahjong Slot Mahjong judi bola sabung ayam online mahjong ways mahjong ways mahjong ways judi bola SV388 SABUNG AYAM ONLINE GA28 judi bola online sabung ayam online live casino online live casino online SV388 SV388 SV388 SV388 SV388 Mix parlay sabung ayam online SV388 SBOBET88 judi bola judi bola judi bola Reset Pola Blackjack Jadi Kasus Study Mahjong Ways Mahjong Ways Mahjong Ways Mahjong Ways sabung ayam online sabung ayam online judi bola sabung ayam online judi bola Judi Bola Sabung Ayam Online Live Casino Online Sabung Ayam Online Sabung Ayam Online Sabung Ayam Online Sabung Ayam Online Sabung Ayam Online Sabung Ayam Online sabung ayam online judi bola mahjong ways sabung ayam online judi bola mahjong ways mahjong ways sabung ayam online sv388 Sv388 judi bola judi bola judi bola JUARA303 Mahjong ways Judi Bola Judi Bola Sabung Ayam Online Live casino mahjong ways 2 sabung ayam online sabung ayam online mahjong ways mahjong ways mahjong ways SV388 SBOBET88 judi bola judi bola judi bola judi bola judi bola https://himakom.fisip.ulm.ac.id/ SABUNG AYAM ONLINE MIX PARLAY SLOT GACOR judi bola online sabung ayam online LIVE CASINO ONLINE Judi Bola Online SABUNG AYAM ONLINE JUDI BOLA ONLINE LIVE CASINO ONLINE JUDI BOLA ONLINE LIVE CASINO ONLINE LIVE CASINO ONLINE sabung ayam online Portal SV388 SBOBET88 SABUNG AYAM ONLINE JUDI BOLA ONLINE CASINO ONLINE MAHJONG WAYS 2 sabung ayam online judi bola SABUNG AYAM ONLINE JUDI BOLA ONLINE Sabung Ayam Online JUDI BOLA Sabung Ayam Online JUDI BOLA SV388, WS168 & GA28 SBOBET88 SV388, WS168 & GA28 SBOBET88 SBOBET88 CASINO ONLINE SLOT GACOR Sabung Ayam Online judi bola