OpenAI Researchers Find That Even the Best AI Is “Unable To Solve the Majority” of Coding Problems


OpenAI researchers have admitted that even the most advanced AI models still are no match for human coders — even though CEO Sam Altman insists they will be able to beat “low-level” software engineers by the end of this year.

In a new paper, the company’s researchers found that even frontier models, or the most advanced and boundary-pushing AI systems, “are still unable to solve the majority” of coding tasks.

The researchers used a newly-developed benchmark called SWE-Lancer, built on more than 1,400 software engineering tasks from the freelancer site Upwork. Using the benchmark, OpenAI put three large language models (LLMs) — its own o1 reasoning model and flagship GPT-4o, as well as Anthropic’s Claude 3.5 Sonnet — to the test.

Specifically, the new benchmark evaluated how well the LLMs performed with two types of tasks from Upwork: individual tasks, which involved resolving bugs and implementing fixes to them, or management tasks that saw the models trying to zoom out and make higher-level decisions. (The models weren’t allowed to access the internet, meaning they couldn’t just crib similar answers that’d been posted online.)

The models took on tasks cumulatively worth hundreds of thousands of dollars on Upwork, but they were only able to fix surface-level software issues, while remaining unable to actually find bugs in larger projects or find their root causes. These shoddy and half-baked “solutions” are likely familiar to anyone who’s worked with AI — which is great at spitting out confident-sounding information that often falls apart on closer inspection.

Though all three LLMs were often able to operate “far faster than a human would,” the paper notes, they also failed to grasp how widespread bugs were or to understand their context, “leading to solutions that are incorrect or insufficiently comprehensive.”

As the researchers explained, Claude 3.5 Sonnet performed better than the two OpenAI models pitted against it and made more money than o1 and GPT-4o. Still, the majority of its answers were wrong, and according to the researchers, any model would need “higher reliability” to be trusted with real-life coding tasks.

Put more plainly, the paper seems to demonstrate that although these frontier models can work quickly and solve zoomed-in tasks, they’re are nowhere near as skilled at handling them as human engineers.

Though these LLMs have advanced rapidly over the past few years and will likely continue to do so, they’re not skilled enough at software engineering to replace real-life people quite yet — not that that’s stopping CEOs from firing their human coders in favor of immature AI models.

More on AI and coding: Zuckerberg Announces Plans to Automate Facebook Coding Jobs With AI



Source link

Share

Latest Updates

Frequently Asked Questions

Related Articles

Access Denied

Access Denied You don't have permission to access "http://www.gadgets360.com/mobiles/news/flipkart-big-billion-days-sale-2025-nothing-phone-3a-pro-cmf-ear-offers-9279340" on this server. Reference #18.79cfdb17.1757926163.554d5189 https://errors.edgesuite.net/18.79cfdb17.1757926163.554d5189 Source...

Bitcoin rallies 4% to $116K as rate cut hopes boost crypto momentum

Bitcoin rose nearly 4.42% over the past week to trade at $116,031 on...

Court ruling boosts acceptance of personality rights in deepfake cases

India's debate on personality rights has been intensified by the Delhi High Court's...
sabung ayam online sabung ayam online sabung ayam online sabung ayam online sabung ayam online Sabung Ayam Online Sv388 Sv388 SV388 sabung ayam online sabung ayam online Sabung Ayam Online sabung ayam online sabung ayam online sabung ayam online Sabung ayam online Sabung ayam online SV388 sabung ayam online sabung ayam online sabung ayam online sabung ayam online sabung ayam online sabung ayam online SV388 sabung ayam online SV388 SV388 Sabung Ayam Online Sabung Ayam Online Sabung Ayam Online Sabung Ayam Online Sv388 SV388 SV388 sabung ayam online sv388 sv388 sabung ayam online sv388
judi bola judi bola Judi bola SBOBET judi bola judi bola judi bola Judi Bola Online judi bola judi bola judi bola judi bola judi bola judi bola juara303 juara303 Judi bola online judi bola judi bola judi bola judi bola judi bola judi bola judi bola judi bola SBOBET judi bola judi bola judi bola Judi Bola SBOBET88 SBOBET88 judi bola judi bola judi bola JUDI BOLA ONLINE JUDI BOLA ONLINE SBOBET88 Judi Bola Judi Bola judi bola judi bola judi bola judi bola judi bola Judi Bola Online judi bola judi bola judi bola judi bola mix parlay
CASINO ONLINE SLOT GACOR live casino mahjong ways Live Casino Online Slot Gacor Mahjong Ways slot pulsa Casino Online Slot Gacor Mix Parlay live casino online live casino online LIVE CASINO ONLINE LIVE CASINO ONLINE slot pulsa slot pulsa slot pulsa Mpo Slot
https://ejurnal.staidarulkamal.ac.id/ https://doctorsnutritionprogram.com/ https://nielsen-restaurante.com/ https://www.atobapizzaria.com.br/ https://casadeapoio.com.br/ https://bracoalemao.com.br/ https://letspetsresort.com.br/ https://mmsolucoesweb.com.br/ https://procao.com.br/
Rahasia Kemenangan di Mahjong Wild Pemain Tidak Menyangka Pola Scatter Jangan Anggap Remeh Mahjong Wild Pemain Pemula Heran Setelah Coba Mahjong Wild Menemukan Pola Rahasia yang Bikin Scatter Muncul Pola Scatter Rahasia yang Baru Terbongkar Pola Rahasia Pemain Pemula Terbongkar Mereka Ketagihan Karena Sering Dapat Kemenangan Mereka Ketagihan Karena Sering Dapat Kemenangan Trik Sederhana Saat Taruhan Kecil Pola Wild Liar Tersembunyi Bisa Menggandakan uang Pola Rahasia Baru Bisa Menghasilkan Wild Buktikan Pola Wild Liar dan Scatter Hitam Kaya Setelah Main Mahjong Wild Pria Asal Nepal Obrak-Abarik Kantor DPR