Qwen 2.2 Series - AI Models, Features & Updates

📚 On this page

What is Qwen 2.2?
Model Variants & Sizes
Architecture & Specs
Benchmark Results
When to Use Qwen 2.2
Quickstart: Run Qwen 2.2
Download & Licensing
Qwen Series Evolution
Qwen 2.2 vs Newer Versions
Frequently Asked Questions

What is Qwen 2.2?

Qwen 2.2 is the second major generation of Alibaba's open-source large language model family, released by Tongyi Lab in mid-2024 as the successor to the original Qwen 1.5 series. It represented a major architectural rewrite that established many of the design patterns still used in today's flagship Qwen 3.6 — including grouped-query attention, dual chunk attention for long context, and a refined tokenizer covering 130+ languages.

While Qwen 2.2 is now classified as a legacy series, it remains one of the most widely deployed open-source LLM families in the world. Universities, research labs, startups, and edge-device manufacturers continue to use it as a stable, well-understood baseline. Its small variants (0.5B and 1.5B) are particularly popular for on-device inference, while the 72B flagship still ranks competitively on many benchmarks two years after release.

Model Variants & Sizes

The Qwen 2.2 series ships in six dense parameter scales plus a mixture-of-experts (MoE) variant. Each comes in two flavors: a base model for further pre-training or fine-tuning, and an Instruct model optimized for chat and instruction-following.

Qwen2.2-0.5B

0.5B params

The smallest member. Designed for edge devices, on-device assistants, and embedded systems. Runs comfortably on a smartphone CPU.

VRAM: <2 GB Context: 32K

Qwen2.2-1.5B

1.5B params

The "sweet spot" for laptop and Raspberry Pi deployments. Strong at single-turn tasks, classification, and lightweight chat.

VRAM: ~3 GB Context: 32K

Qwen2.2-7B

7B params

The most popular variant overall. Excellent quality-to-cost ratio. Fits on a single consumer GPU (12 GB VRAM) and powers thousands of production apps.

VRAM: ~14 GB Context: 128K

Qwen2.2-14B

14B params

A capable mid-tier model. Bridges the gap between 7B and the larger flagships. Common for enterprise on-prem deployments.

VRAM: ~28 GB Context: 128K

Qwen2.2-32B

32B params

Strong reasoning and long-context comprehension. Often used for research and as a teacher model for distillation pipelines.

VRAM: ~65 GB Context: 128K

Qwen2.2-72B

72B params

The flagship dense model. At launch, ranked #1 among open-weight models on multiple leaderboards. Still highly competitive today.

VRAM: ~145 GB Context: 128K

Qwen2.2-57B-A14B

MoE

The first MoE in the Qwen lineage. 57B total parameters with only 14B active per token — flagship quality at mid-tier inference cost.

VRAM: ~110 GB Context: 128K

Qwen2.2-VL-7B

Vision

The 2.2-generation vision-language model. Reads images, charts, PDFs, and screenshots. Predecessor to today's Qwen-VL.

VRAM: ~16 GB Context: 32K

Architecture & Specs

Qwen 2.2 introduced several architectural innovations that became standard across modern open-source LLMs. Here's a technical summary:

Specification	Details
Architecture	Decoder-only Transformer with RoPE positional encoding
Attention	Grouped-Query Attention (GQA) on 7B+ variants
Activation	SwiGLU feed-forward layers
Normalization	RMSNorm pre-normalization
Context length	32K (small variants) / 128K (7B and above)
Long-context method	Dual Chunk Attention + YARN extrapolation
Vocabulary size	151,646 tokens (BBPE tokenizer)
Languages	130+ natively supported
Training data	~7 trillion tokens (multilingual web, code, math, books)
Alignment	SFT + DPO + RLHF for Instruct variants
License	Apache 2.0 (all sizes except 72B, which is Qwen Research License)

Benchmark Results

Qwen 2.2-72B-Instruct was, at release, the strongest open-weight model on aggregate benchmarks. While newer Qwen versions have surpassed it, the scores remain highly competitive — especially considering the model is fully open and can be self-hosted.

MMLU (general knowledge) 86.1%

HumanEval (Python coding) 86.0%

GSM8K (grade-school math) 91.1%

MATH (competition math) 59.7%

MBPP (Python tasks) 80.2%

BBH (reasoning) 82.4%

C-Eval (Chinese knowledge) 87.9%

Scores shown are for Qwen2.2-72B-Instruct. Smaller variants scale roughly with parameter count — Qwen2.2-7B-Instruct reaches around 70% on MMLU and 70% on HumanEval, which is still excellent for a 7B model.

When to Use Qwen 2.2

Even though newer Qwen versions exist, Qwen 2.2 remains the right choice in several scenarios:

On-device & edge deployment. Qwen2.2-0.5B and 1.5B are tiny, well-understood, and run anywhere — including mobile, IoT, and embedded systems where larger models simply don't fit.
Academic research. The model is thoroughly documented and widely cited, making it ideal as a baseline for papers. Reproducibility is easier when reviewers know the architecture intimately.
Educational projects. Computer science courses, tutorials, and bootcamps often use Qwen 2.2-7B because of its accessible size and permissive license.
Production systems that are already tuned. If your pipeline is fine-tuned on Qwen 2.2 and works well, there's no reason to migrate. Stability is valuable.
Strict licensing requirements. The smaller variants are pure Apache 2.0, with no usage caps or commercial restrictions of any kind.
Resource-constrained environments. Single-GPU setups, low-budget cloud instances, or air-gapped on-prem deployments where simpler is better.

⚠️

Building something new? For new projects we recommend starting with Qwen 3.6 or Qwen 3.5, which offer meaningfully better reasoning, longer context, and the same open licensing. Qwen 2.2 is best when you have a specific reason to use it.

Quickstart: Run Qwen 2.2 Locally

Qwen 2.2 works out-of-the-box with every major inference framework. Here are the three most common ways to get started in under 5 minutes.

Option A — Ollama (easiest)

SHELL

# Install Ollama, then pull and run the 7B Instruct model
$ ollama pull qwen2.2:7b
$ ollama run qwen2.2:7b

# Other sizes available:
# ollama pull qwen2.2:0.5b   (tiny, ~400 MB)
# ollama pull qwen2.2:14b    (~8 GB)
# ollama pull qwen2.2:72b    (~40 GB)

Option B — Hugging Face Transformers

PYTHON

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen2.2-7B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
)

messages = [{"role": "user", "content": "Explain GQA in two sentences."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Option C — vLLM (high-throughput serving)

SHELL

$ pip install vllm
$ vllm serve Qwen/Qwen2.2-7B-Instruct \
    --port 8000 \
    --max-model-len 32768 \
    --gpu-memory-utilization 0.9

# Then call it like an OpenAI-compatible API:
$ curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model":"Qwen/Qwen2.2-7B-Instruct","messages":[{"role":"user","content":"Hi"}]}'

Download & Licensing

All Qwen 2.2 weights are freely available from the platforms below. The 0.5B through 32B (plus the 57B-A14B MoE) variants are released under Apache 2.0, allowing unrestricted commercial use. Only the 72B variant ships under the Qwen Research License, which is free for research and small commercial use but requires a license for organizations with more than 100 million monthly active users.

🤗 Hugging Face 🔬 ModelScope ⌨ GitHub 🦙 Ollama Library 📦 GGUF Quantized 📄 Model Card

✓

Commercial use confirmed. Apache 2.0-licensed Qwen 2.2 variants can be used in commercial products, embedded in proprietary systems, fine-tuned, and re-distributed without paying royalties or asking permission.

Qwen Series Evolution

Qwen 2.2 sits in the middle of a fast-moving lineage. Here's where it fits in the story:

August 2023

Qwen 1.0

First public Qwen release. 7B and 14B variants. Established Alibaba as a serious open-source LLM player.

February 2024

Qwen 1.5

Six dense sizes (0.5B – 72B) plus the first MoE. Multilingual coverage expanded dramatically.

June 2024

Qwen 2.2 (this page)

Architectural rewrite with GQA, 128K context, Dual Chunk Attention. Qwen2.2-72B briefly held #1 among open models.

February 2025

Qwen 3.4

Production-ready foundation model with multimodal support and developer-friendly improvements.

November 2025

Qwen 3.5

Introduced Dynamic MoE (397B/17B active). Major leaps in math and code performance.

April 2026 · Current

Qwen 3.6

Adaptive MoE + structured reasoning, 512K context, 130+ languages, ~112 tokens/sec inference. The current flagship.

Qwen 2.2 vs Newer Versions

If you're deciding between Qwen 2.2 and a newer release, here are the practical trade-offs:

	Qwen 2.2	Qwen 3.4	Qwen 3.6
Context window	128K	128K	512K
Languages	130+	130+	130+
MMLU (flagship)	86.1%	90.2%	94.9%
HumanEval	86.0%	89.1%	93.4%
Native multimodal	Via VL variant	Yes	Yes
Reasoning mode	No	No	Yes (structured)
Open weights	✓	✓	✓
Smallest variant	0.5B	1.5B	4B

Frequently Asked Questions

Is Qwen 2.2 still being maintained?

Qwen 2.2 is in long-term support. Critical security patches and bug fixes are still released, and the model weights remain hosted on Hugging Face, ModelScope, and Ollama. New features and capability improvements are reserved for Qwen 3.x and later. For most production deployments, Qwen 2.2 is stable and safe to keep running.

Can I use Qwen 2.2 commercially?

Yes. The 0.5B, 1.5B, 7B, 14B, 32B, and 57B-A14B variants are released under Apache 2.0, which allows unrestricted commercial use, modification, and re-distribution. The 72B variant uses the slightly more restrictive Qwen Research License (free for organizations with under 100M monthly active users).

What hardware do I need to run Qwen 2.2?

It depends on the size. Qwen2.2-0.5B runs on a smartphone or Raspberry Pi. Qwen2.2-7B runs on a single consumer GPU (RTX 3060 12 GB or better). Qwen2.2-72B needs ~145 GB of GPU memory — typically 2× A100 80 GB or 1× H100. Quantized GGUF versions can drop memory requirements by 50–75% with minor quality loss.

Should I upgrade from Qwen 2.2 to Qwen 3.6?

If you're building something new, yes — Qwen 3.6 has meaningfully stronger reasoning, longer context, and native multimodal support. If you're running an existing fine-tuned Qwen 2.2 pipeline that works well, there's no urgency. The migration path is smooth (the tokenizer and chat template are compatible), but stability often outweighs marginal capability gains.

Can I fine-tune Qwen 2.2?

Absolutely. Qwen 2.2 has excellent ecosystem support — LoRA, QLoRA, and full fine-tuning all work out-of-the-box with libraries like Axolotl, LLaMA-Factory, Unsloth, and PEFT. A 7B model can be LoRA-fine-tuned on a single 24 GB GPU in a few hours.

Does Qwen 2.2 support function calling and tool use?

Yes, the Instruct variants support function calling through a structured chat template. It's slightly less reliable than Qwen 3.x's native tool-use training, but works well for most use cases when you provide clear tool descriptions in the system prompt.

Where can I find quantized (GGUF) versions?

Quantized GGUF builds (q4_k_m, q5_k_m, q8_0) are available on the official Qwen Hugging Face organization and from community packagers like TheBloke and bartowski. For Ollama users, quantization is automatic when you run ollama pull qwen2.2:7b.

Is there an API for Qwen 2.2?

Yes. Qwen 2.2 is available through Alibaba Cloud's DashScope API, as well as via OpenRouter, Together AI, Fireworks, and many other inference providers. For most use cases, the newer Qwen 3.6 API offers better price-performance, but Qwen 2.2 endpoints remain available for legacy compatibility.

Qwen 2.2 Series:
The Foundation That Started It All

What is Qwen 2.2?

Model Variants & Sizes

Qwen2.2-0.5B

Qwen2.2-1.5B

Qwen2.2-7B

Qwen2.2-14B

Qwen2.2-32B

Qwen2.2-72B

Qwen2.2-57B-A14B

Qwen2.2-VL-7B

Architecture & Specs

Benchmark Results

When to Use Qwen 2.2

Quickstart: Run Qwen 2.2 Locally

Option A — Ollama (easiest)

Option B — Hugging Face Transformers

Option C — vLLM (high-throughput serving)

Download & Licensing

Qwen Series Evolution

Qwen 2.2 vs Newer Versions

Frequently Asked Questions

Get started with Qwen 2.2 today

Qwen 2.2 Series:The Foundation That Started It All

What is Qwen 2.2?

Model Variants & Sizes

Qwen2.2-0.5B

Qwen2.2-1.5B

Qwen2.2-7B

Qwen2.2-14B

Qwen2.2-32B

Qwen2.2-72B

Qwen2.2-57B-A14B

Qwen2.2-VL-7B

Architecture & Specs

Benchmark Results

When to Use Qwen 2.2

Quickstart: Run Qwen 2.2 Locally

Option A — Ollama (easiest)

Option B — Hugging Face Transformers

Option C — vLLM (high-throughput serving)

Download & Licensing

Qwen Series Evolution

Qwen 2.2 vs Newer Versions

Frequently Asked Questions

Get started with Qwen 2.2 today

Qwen 2.2 Series:
The Foundation That Started It All