Alibaba’s Qwen with Questions reasoning model beats o1-preview


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Chinese e-commerce giant Alibaba has released the latest model in its ever-expanding Qwen family. This one is known as Qwen with Questions (QwQ), and serves as the latest open source competitor to OpenAI’s o1 reasoning model.

Like other large reasoning models (LRMs), QwQ uses extra compute cycles during inference to review its answers and correct its mistakes, making it more suitable for tasks that require logical reasoning and planning like math and coding.

What is Qwen with Questions (OwQ?) and can it be used for commercial purposes?

Alibaba has released a 32-billion-parameter version of QwQ with a 32,000-token context. The model is currently in preview, which means a higher-performing version is likely to follow.

According to Alibaba’s tests, QwQ beats o1-preview on the AIME and MATH benchmarks, which evaluate mathematical problem-solving abilities. It also outperforms o1-mini on GPQA, a benchmark for scientific reasoning. QwQ is inferior to o1 on the LiveCodeBench coding benchmarks but still outperforms other frontier models such as GPT-4o and Claude 3.5 Sonnet.

Example output of Qwen with Questions

QwQ does not come with an accompanying paper that describes the data or the process used to train the model, which makes it difficult to reproduce the model’s results. However, since the model is open, unlike OpenAI o1, its “thinking process” is not hidden and can be used to make sense of how the model reasons when solving problems.

Alibaba has also released the model under an Apache 2.0 license, which means it can be used for commercial purposes.

‘We discovered something profound’

According to a blog post that was published along with the model’s release, “Through deep exploration and countless trials, we discovered something profound: when given time to ponder, to question, and to reflect, the model’s understanding of mathematics and programming blossoms like a flower opening to the sun… This process of careful reflection and self-questioning leads to remarkable breakthroughs in solving complex problems.”

This is very similar to what we know about how reasoning models work. By generating more tokens and reviewing their previous responses, the models are more likely to correct potential mistakes. Marco-o1, another reasoning model recently released by Alibaba might also contain hints of how QwQ might be working. Marco-o1 uses Monte Carlo Tree Search (MCTS) and self-reflection at inference time to create different branches of reasoning and choose the best answers. The model was trained on a mixture of chain-of-thought (CoT) examples and synthetic data generated with MCTS algorithms.

Alibaba points out that QwQ still has limitations such as mixing languages or getting stuck in circular reasoning loops. The model is available for download on Hugging Face and an online demo can be found on Hugging Face Spaces.

The LLM age gives way to LRMs: Large Reasoning Models

The release of o1 has triggered growing interest in creating LRMs, even though not much is known about how the model works under the hood aside from using inference-time scale to improve the model’s responses. 

There are now several Chinese competitors to o1. Chinese AI lab DeepSeek recently released R1-Lite-Preview, its o1 competitor, which is currently only available through the company’s online chat interface. R1-Lite-Preview reportedly beats o1 on several key benchmarks.

Another recently released model is LLaVA-o1, developed by researchers from multiple universities in China, which brings the inference-time reasoning paradigm to open-source vision language models (VLMs). 

The focus on LRMs comes at a time of uncertainty about the future of model scaling laws. Reports indicate that AI labs such as OpenAI, Google DeepMind, and Anthropic are getting diminishing returns on training larger models. And creating larger volumes of quality training data is becoming increasingly difficult as models are already being trained on trillions of tokens gathered from the internet. 

Meanwhile, inference-time scale offers an alternative that might provide the next breakthrough in improving the abilities of the next generation of AI models. There are reports that OpenAI is using o1 to generate synthetic reasoning data to train the next generation of its LLMs. The release of open reasoning models is likely to stimulate progress and make the space more competitive.



Source link

Share

Latest Updates

Frequently Asked Questions

Related Articles

Fixing Hallucinations Would Destroy ChatGPT, Expert Finds

In a paper published earlier this month, OpenAI researchers said they'd found the...

Centre’s AI roadmap targets $1.7 trillion GDP boost by 2035

New Delhi: The government aims to generate additional $1.7 trillion in economic value...

Google to bring iPhone-style live video sharing to Android Emergencies

Smartphones play a crucial role during emergencies, allowing users to quickly notify emergency...
sabung ayam online sabung ayam online sabung ayam online sabung ayam online sabung ayam online Sabung Ayam Online Sv388 Sv388 SV388 sabung ayam online sabung ayam online Sabung Ayam Online sabung ayam online sabung ayam online sabung ayam online Sabung ayam online Sabung ayam online SV388 sabung ayam online sabung ayam online sabung ayam online sabung ayam online sabung ayam online sabung ayam online SV388 sabung ayam online SV388 SV388 Sabung Ayam Online Sabung Ayam Online Sabung Ayam Online Sabung Ayam Online Sv388 SV388 SV388 sabung ayam online sv388 sv388 sabung ayam online sv388
judi bola judi bola Judi bola SBOBET judi bola judi bola judi bola Judi Bola Online judi bola judi bola judi bola judi bola judi bola judi bola juara303 juara303 Judi bola online judi bola judi bola judi bola judi bola judi bola judi bola judi bola judi bola SBOBET judi bola judi bola judi bola Judi Bola SBOBET88 SBOBET88 judi bola judi bola judi bola JUDI BOLA ONLINE JUDI BOLA ONLINE SBOBET88 Judi Bola Judi Bola judi bola judi bola judi bola judi bola judi bola Judi Bola Online judi bola judi bola judi bola judi bola mix parlay
CASINO ONLINE SLOT GACOR live casino mahjong ways Live Casino Online Slot Gacor Mahjong Ways slot pulsa Casino Online Slot Gacor Mix Parlay live casino online live casino online LIVE CASINO ONLINE LIVE CASINO ONLINE slot pulsa slot pulsa slot pulsa Mpo Slot
https://ejurnal.staidarulkamal.ac.id/ https://doctorsnutritionprogram.com/ https://nielsen-restaurante.com/ https://www.atobapizzaria.com.br/ https://casadeapoio.com.br/ https://bracoalemao.com.br/ https://letspetsresort.com.br/ https://mmsolucoesweb.com.br/ https://procao.com.br/
Rahasia Kemenangan di Mahjong Wild Pemain Tidak Menyangka Pola Scatter Jangan Anggap Remeh Mahjong Wild Pemain Pemula Heran Setelah Coba Mahjong Wild Menemukan Pola Rahasia yang Bikin Scatter Muncul Pola Scatter Rahasia yang Baru Terbongkar Pola Rahasia Pemain Pemula Terbongkar Mereka Ketagihan Karena Sering Dapat Kemenangan Mereka Ketagihan Karena Sering Dapat Kemenangan Trik Sederhana Saat Taruhan Kecil Pola Wild Liar Tersembunyi Bisa Menggandakan uang Pola Rahasia Baru Bisa Menghasilkan Wild Buktikan Pola Wild Liar dan Scatter Hitam Kaya Setelah Main Mahjong Wild Pria Asal Nepal Obrak-Abarik Kantor DPR