Qwen Math - Advanced AI Model for Math Reasoning & Problem Solving

What is Qwen-Math?

Qwen-Math is the math-specialized branch of Alibaba Cloud's Qwen model family - a series of open-source large language models purpose-built for mathematical reasoning. Where general-purpose LLMs treat math as just another task, Qwen-Math is trained from the ground up on a curated trillion-token math corpus and refined through self-improvement techniques specifically designed to make the model better at multi-step quantitative reasoning. The result is a family of models that, size for size, dramatically outperforms general models on math benchmarks while remaining fully open-weight and self-hostable.

The series began with Qwen2-Math in mid-2024 and was succeeded a month later by the much stronger Qwen2.5-Math, which is the current production release. The Qwen2.5-Math family includes three base models (1.5B, 7B, and 72B parameters), three instruction-tuned variants, and a dedicated mathematical reward model (Qwen2.5-Math-RM) used for reward-guided inference and reinforcement learning. Each model is bilingual, handling English and Chinese math problems equally well — a meaningful improvement over Qwen2-Math, which was English-only.

What sets Qwen-Math apart from general-purpose LLMs is the deliberate engineering choice to optimize narrowly. The team didn't try to make a model that's good at everything; they made one that's exceptional at math. Qwen2.5-Math-1.5B-Instruct surpasses most previous 70B-parameter math models on standard benchmarks, and Qwen2.5-Math-7B-Instruct matches the performance of the much larger Qwen2-Math-72B-Instruct. This is the payoff of specialization done well — you get frontier math performance at a fraction of the inference cost of a general frontier model.

Key Features

Qwen-Math is purpose-built for one thing — solving math problems - and the feature set reflects that focus rather than trying to be a general-purpose assistant.

Chain-of-Thought (CoT) Reasoning: The model generates step-by-step natural language solutions, parses multi-line LaTeX, and explains both the algorithmic and conceptual aspects of each problem.
Tool-Integrated Reasoning (TIR): Qwen-Math can write and execute Python code (typically using SymPy) as part of its solution, giving it exact computational accuracy on tasks where pure language reasoning would drift — quadratic roots, eigenvalues, large arithmetic, symbolic manipulation.
Bilingual Support: Full coverage of English and Chinese math problems, with millions of bilingual training examples and synthesized data for both languages.
Three Model Sizes: 1.5B for edge and on-device, 7B for single-GPU production, 72B for frontier accuracy on competition-level problems.
Math Reward Model: Qwen2.5-Math-RM is a separate reward model trained to score mathematical solutions, useful for best-of-N inference, RL fine-tuning, and verifying solutions in custom pipelines.
Multimodal Math: Combined with Qwen2-VL or Qwen3-VL for OCR, the system can read math problems from photos, screenshots, sketches, and handwritten notes.
Long Context: Up to 128K tokens in later variants, useful for working through long proofs, multi-part competition problems, and detailed problem sets.
Open Weights: All models are released under permissive licenses on Hugging Face and ModelScope, making them suitable for commercial use, self-hosting, and fine-tuning.

Architecture and Training

Qwen2.5-Math is built on the decoder-only transformer architecture from the Qwen2.5 base series, with pre-layer normalization, rotary position embeddings (RoPE), GELU activations, and grouped query attention for efficient long-context inference. What makes it a math expert isn't the base architecture — it's the training pipeline applied on top.

The team built a dedicated mathematical corpus, Qwen Math Corpus v2, comprising more than one trillion high-quality math tokens curated from public datasets, web documents, textbooks, code repositories, and synthetic problem-generation pipelines seeded by the earlier Qwen2-Math-72B-Instruct model. Pre-training continued from the Qwen2.5 base on this math-heavy mixture, giving the model deep familiarity with mathematical notation, problem structures, and proof patterns long before fine-tuning began.

Post-training is where Qwen-Math's self-improvement pipeline kicks in. Three techniques work in concert: synthesizing fine-tuning data using larger Qwen-Math models as teachers, training a math-specific reward model (Qwen2.5-Math-RM) to score candidate solutions, and using that reward model to guide reinforcement learning that progressively improves the model's reasoning. The same reward model is then used at inference time for reward-guided decoding, where the model generates multiple candidate solutions and the reward model picks the best one. This iterative loop is the primary reason Qwen2.5-Math significantly outperforms Qwen2-Math at every size.

Model Variants

The Qwen2.5-Math family is organized around three sizes, each available as a base model and as an instruction-tuned variant. Qwen2.5-Math-1.5B is the smallest, suitable for laptop inference and on-device deployment — despite its size, it surpasses most older 70B math models on standard benchmarks. Qwen2.5-Math-7B is the sweet spot for most production use, running comfortably on a single consumer GPU (around 16 GB of VRAM) and matching the older Qwen2-Math-72B on accuracy. Qwen2.5-Math-72B is the frontier variant, intended for the hardest problems where you want every last point of accuracy on competition-level math; it requires multi-GPU serving with around 145 GB of VRAM at full precision.

Alongside the language models, the team released Qwen2.5-Math-RM-72B, a reward model that scores mathematical solutions for correctness and reasoning quality. This is a developer tool more than an end-user model, but it's invaluable if you're running best-of-N inference, reinforcement learning fine-tuning, or building automated solution verification into a math tutoring product.

Download & Access

All Qwen-Math models are openly available. Weights are on Hugging Face and ModelScope, code is on GitHub, and you can try the models in your browser without any setup.

Official sources:

Hugging Face: huggingface.co/Qwen - canonical weights for Qwen2.5-Math-1.5B, 7B, 72B, and Math-RM-72B.
ModelScope: modelscope.cn/organization/qwen - Alibaba's own model hub, recommended for users in mainland China.
GitHub: github.com/QwenLM/Qwen2.5-Math - official repo with inference code, evaluation scripts, and the Qwen-Agent TIR demo.
Ollama: Pull quantized GGUF builds of the smaller variants for one-command local serving.
Qwen Chat (web): chat.qwen.ai - try Qwen-Math live in your browser, no install required.
Alibaba Cloud API: Model Studio - hosted API access with pay-per-token billing.

Installation Guide

The right install path depends on your use case. For casual exploration, the web app is fine. For local inference on your own GPU, Hugging Face transformers is the most flexible route. For production serving, vLLM or SGLang give you batching and an OpenAI-compatible API. For the full Tool-Integrated Reasoning experience, the official Qwen-Agent demo is the easiest way to get started.

Option 1: Qwen Chat (Web, Zero Install)

Open chat.qwen.ai in any modern browser, sign in, and type your math question. For problems on paper, drag in a photo or screenshot and the system will use Qwen-VL for OCR before handing the parsed problem to Qwen-Math. This is the fastest way to evaluate whether Qwen-Math fits your use case before committing to local deployment.

Option 2: Hugging Face Transformers (Python)

For local inference, you'll need Python 3.10+, PyTorch, and roughly 4 GB of VRAM for the 1.5B model or 16 GB for the 7B. Install the libraries first:

pip install transformers torch accelerate

Then run your first chain-of-thought math query against Qwen2.5-Math-7B-Instruct:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Qwen/Qwen2.5-Math-7B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
)

prompt = (
    "Find all real numbers x such that "
    "x^4 - 5x^2 + 4 = 0. "
    "Show your full reasoning step by step."
)

messages = [
    {"role": "system", "content": "Please reason step by step, "
                                  "and put your final answer within \\boxed{}."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=1024)
response = tokenizer.decode(
    output[0][inputs.input_ids.shape[1]:],
    skip_special_tokens=True,
)

print(response)

Option 3: Ollama (Quantized, One Command)

If you want a smaller footprint or just a faster setup, Ollama serves quantized Qwen-Math models with a single command. Install Ollama from ollama.com/download, then:

ollama pull mightykatun/qwen2.5-math
ollama run mightykatun/qwen2.5-math

This pulls a quantized GGUF build and starts an interactive prompt. Ollama also exposes an OpenAI-compatible HTTP API at http://localhost:11434 for use from your own applications.

Option 4: Tool-Integrated Reasoning with Qwen-Agent

For TIR mode — where the model can execute Python code as part of its reasoning — the official repo includes a Qwen-Agent demo. Clone the repo and follow the README to launch the demo locally:

git clone https://github.com/QwenLM/Qwen2.5-Math.git
cd Qwen2.5-Math
pip install -r requirements.txt
python examples/qwen_agent_demo.py

The agent runs Python code locally in a sandbox to compute exact roots, evaluate integrals, manipulate matrices, and verify intermediate steps. This dramatically improves accuracy on problems where pure CoT struggles, like finding eigenvalues of large matrices or evaluating definite integrals symbolically.

Option 5: vLLM for Production Serving

For high-throughput production deployment with batching and an OpenAI-compatible API:

pip install vllm

vllm serve Qwen/Qwen2.5-Math-7B-Instruct \
    --host 0.0.0.0 --port 8000 \
    --max-model-len 32768

Once running, hit http://localhost:8000/v1/chat/completions from any OpenAI client.

Using the Hosted API

If you'd rather not run inference yourself, Alibaba Cloud's Model Studio hosts the Qwen-Math models with pay-per-token pricing. The endpoint is OpenAI-compatible:

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

response = client.chat.completions.create(
    model="qwen2.5-math-72b-instruct",
    messages=[
        {"role": "system", "content": "Please reason step by step, "
                                      "and put your final answer within \\boxed{}."},
        {"role": "user", "content": "Solve for x: 2x^2 - 8x + 6 = 0"}
    ]
)

print(response.choices[0].message.content)

Real-World Use Cases

Qwen-Math's specialization makes it a strong fit for a handful of well-defined applications. Math tutoring and homework help is the most obvious — the model's step-by-step CoT output is exactly what students need to learn from, not just an answer. The bilingual support means a single deployment can serve both English-speaking and Chinese-speaking students. Automated grading uses the reward model (Qwen2.5-Math-RM) to score student solutions for both correctness and reasoning quality, useful for high-volume assessment in education platforms.

In research and engineering, Qwen-Math handles symbolic computation tasks that would otherwise require a CAS like Mathematica or Maple — solving systems of equations, evaluating integrals, manipulating series, and finding eigenvalues. With TIR enabled, the answers are exact rather than approximated. Quantitative finance and data science teams use it for statistical derivations, optimization problem setup, and verifying mathematical models. And in technical content authoring, the model can generate worked examples, verify formula correctness, and produce LaTeX-formatted solutions for textbooks, blog posts, and documentation.

Tips and Best Practices

A few pragmatic tips that come up repeatedly when working with Qwen-Math. First, always include the standard system prompt: "Please reason step by step, and put your final answer within \boxed{}." This is the prompt format the model was trained with, and skipping it noticeably degrades output quality. Second, for problems with exact answers (algebra, calculus, number theory) enable TIR to get computational accuracy you can't reliably get from CoT alone. For open-ended problems (proofs, explanations, conceptual reasoning) pure CoT is usually the better choice.

When choosing a model size, start with the 7B variant — it covers the vast majority of practical use cases at low cost. Move up to 72B only if you're working on competition-level problems or need every last point of accuracy. The 1.5B model is genuinely impressive for its size but really shines for edge deployment rather than as a primary production model. And remember that Qwen-Math is specialized: don't try to use it as a general assistant. For non-math tasks switch to Qwen2.5 or Qwen3 — the math models aren't tuned for general dialogue and will underperform there.

Final Thoughts

Qwen-Math is a textbook example of how to specialize an LLM well. Rather than chasing benchmark scores across every task, Alibaba built a focused family of models that does one thing exceptionally — and the results show. A 1.5B parameter math model beating older 70B general models is the kind of efficiency gain that changes what's economically viable in education and quantitative tooling. The decision to release all of it (including the reward model) under open-weight licenses makes Qwen-Math the obvious default for anyone building math-focused AI applications today.

The easiest way to start is at chat.qwen.ai with a hard math problem you want solved. Once you see the quality of the step-by-step output, the open weights are one pip install away.

Qwen Math: Open Source Math LLM