Home Products Features Compare API FAQ Get API Key โ†’
Legacy series ยท Still maintained

Qwen 2.2 Series:
The Foundation That Started It All

The classic open-source LLM family that defined modern Qwen. Six sizes from 0.5B to 72B parameters, Apache 2.0 licensed, with 128K context and proven production deployments worldwide. Still widely used in research, education, and resource-constrained environments.

What is Qwen 2.2?

Qwen 2.2 is the second major generation of Alibaba's open-source large language model family, released by Tongyi Lab in mid-2024 as the successor to the original Qwen 1.5 series. It represented a major architectural rewrite that established many of the design patterns still used in today's flagship Qwen 3.6 โ€” including grouped-query attention, dual chunk attention for long context, and a refined tokenizer covering 130+ languages.

While Qwen 2.2 is now classified as a legacy series, it remains one of the most widely deployed open-source LLM families in the world. Universities, research labs, startups, and edge-device manufacturers continue to use it as a stable, well-understood baseline. Its small variants (0.5B and 1.5B) are particularly popular for on-device inference, while the 72B flagship still ranks competitively on many benchmarks two years after release.

Model Variants & Sizes

The Qwen 2.2 series ships in six dense parameter scales plus a mixture-of-experts (MoE) variant. Each comes in two flavors: a base model for further pre-training or fine-tuning, and an Instruct model optimized for chat and instruction-following.

Qwen2.2-0.5B

0.5B params

The smallest member. Designed for edge devices, on-device assistants, and embedded systems. Runs comfortably on a smartphone CPU.

VRAM: <2 GB Context: 32K

Qwen2.2-1.5B

1.5B params

The "sweet spot" for laptop and Raspberry Pi deployments. Strong at single-turn tasks, classification, and lightweight chat.

VRAM: ~3 GB Context: 32K

Qwen2.2-7B

7B params

The most popular variant overall. Excellent quality-to-cost ratio. Fits on a single consumer GPU (12 GB VRAM) and powers thousands of production apps.

VRAM: ~14 GB Context: 128K

Qwen2.2-14B

14B params

A capable mid-tier model. Bridges the gap between 7B and the larger flagships. Common for enterprise on-prem deployments.

VRAM: ~28 GB Context: 128K

Qwen2.2-32B

32B params

Strong reasoning and long-context comprehension. Often used for research and as a teacher model for distillation pipelines.

VRAM: ~65 GB Context: 128K

Qwen2.2-72B

72B params

The flagship dense model. At launch, ranked #1 among open-weight models on multiple leaderboards. Still highly competitive today.

VRAM: ~145 GB Context: 128K

Qwen2.2-57B-A14B

MoE

The first MoE in the Qwen lineage. 57B total parameters with only 14B active per token โ€” flagship quality at mid-tier inference cost.

VRAM: ~110 GB Context: 128K

Qwen2.2-VL-7B

Vision

The 2.2-generation vision-language model. Reads images, charts, PDFs, and screenshots. Predecessor to today's Qwen-VL.

VRAM: ~16 GB Context: 32K

Architecture & Specs

Qwen 2.2 introduced several architectural innovations that became standard across modern open-source LLMs. Here's a technical summary:

Specification Details
ArchitectureDecoder-only Transformer with RoPE positional encoding
AttentionGrouped-Query Attention (GQA) on 7B+ variants
ActivationSwiGLU feed-forward layers
NormalizationRMSNorm pre-normalization
Context length32K (small variants) / 128K (7B and above)
Long-context methodDual Chunk Attention + YARN extrapolation
Vocabulary size151,646 tokens (BBPE tokenizer)
Languages130+ natively supported
Training data~7 trillion tokens (multilingual web, code, math, books)
AlignmentSFT + DPO + RLHF for Instruct variants
LicenseApache 2.0 (all sizes except 72B, which is Qwen Research License)

Benchmark Results

Qwen 2.2-72B-Instruct was, at release, the strongest open-weight model on aggregate benchmarks. While newer Qwen versions have surpassed it, the scores remain highly competitive โ€” especially considering the model is fully open and can be self-hosted.

MMLU (general knowledge) 86.1%
HumanEval (Python coding) 86.0%
GSM8K (grade-school math) 91.1%
MATH (competition math) 59.7%
MBPP (Python tasks) 80.2%
BBH (reasoning) 82.4%
C-Eval (Chinese knowledge) 87.9%

Scores shown are for Qwen2.2-72B-Instruct. Smaller variants scale roughly with parameter count โ€” Qwen2.2-7B-Instruct reaches around 70% on MMLU and 70% on HumanEval, which is still excellent for a 7B model.

When to Use Qwen 2.2

Even though newer Qwen versions exist, Qwen 2.2 remains the right choice in several scenarios:

โš ๏ธ
Building something new? For new projects we recommend starting with Qwen 3.6 or Qwen 3.5, which offer meaningfully better reasoning, longer context, and the same open licensing. Qwen 2.2 is best when you have a specific reason to use it.

Quickstart: Run Qwen 2.2 Locally

Qwen 2.2 works out-of-the-box with every major inference framework. Here are the three most common ways to get started in under 5 minutes.

Option A โ€” Ollama (easiest)

SHELL
# Install Ollama, then pull and run the 7B Instruct model
$ ollama pull qwen2.2:7b
$ ollama run qwen2.2:7b

# Other sizes available:
# ollama pull qwen2.2:0.5b   (tiny, ~400 MB)
# ollama pull qwen2.2:14b    (~8 GB)
# ollama pull qwen2.2:72b    (~40 GB)

Option B โ€” Hugging Face Transformers

PYTHON
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen2.2-7B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
)

messages = [{"role": "user", "content": "Explain GQA in two sentences."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Option C โ€” vLLM (high-throughput serving)

SHELL
$ pip install vllm
$ vllm serve Qwen/Qwen2.2-7B-Instruct \
    --port 8000 \
    --max-model-len 32768 \
    --gpu-memory-utilization 0.9

# Then call it like an OpenAI-compatible API:
$ curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model":"Qwen/Qwen2.2-7B-Instruct","messages":[{"role":"user","content":"Hi"}]}'

Download & Licensing

All Qwen 2.2 weights are freely available from the platforms below. The 0.5B through 32B (plus the 57B-A14B MoE) variants are released under Apache 2.0, allowing unrestricted commercial use. Only the 72B variant ships under the Qwen Research License, which is free for research and small commercial use but requires a license for organizations with more than 100 million monthly active users.

โœ“
Commercial use confirmed. Apache 2.0-licensed Qwen 2.2 variants can be used in commercial products, embedded in proprietary systems, fine-tuned, and re-distributed without paying royalties or asking permission.

Qwen Series Evolution

Qwen 2.2 sits in the middle of a fast-moving lineage. Here's where it fits in the story:

August 2023
Qwen 1.0

First public Qwen release. 7B and 14B variants. Established Alibaba as a serious open-source LLM player.

February 2024
Qwen 1.5

Six dense sizes (0.5B โ€“ 72B) plus the first MoE. Multilingual coverage expanded dramatically.

June 2024
Qwen 2.2 (this page)

Architectural rewrite with GQA, 128K context, Dual Chunk Attention. Qwen2.2-72B briefly held #1 among open models.

February 2025
Qwen 3.4

Production-ready foundation model with multimodal support and developer-friendly improvements.

November 2025
Qwen 3.5

Introduced Dynamic MoE (397B/17B active). Major leaps in math and code performance.

April 2026 ยท Current
Qwen 3.6

Adaptive MoE + structured reasoning, 512K context, 130+ languages, ~112 tokens/sec inference. The current flagship.

Qwen 2.2 vs Newer Versions

If you're deciding between Qwen 2.2 and a newer release, here are the practical trade-offs:

Qwen 2.2 Qwen 3.4 Qwen 3.6
Context window128K128K512K
Languages130+130+130+
MMLU (flagship)86.1%90.2%94.9%
HumanEval86.0%89.1%93.4%
Native multimodalVia VL variantYesYes
Reasoning modeNoNoYes (structured)
Open weightsโœ“โœ“โœ“
Smallest variant0.5B1.5B4B

Frequently Asked Questions

Is Qwen 2.2 still being maintained?
Qwen 2.2 is in long-term support. Critical security patches and bug fixes are still released, and the model weights remain hosted on Hugging Face, ModelScope, and Ollama. New features and capability improvements are reserved for Qwen 3.x and later. For most production deployments, Qwen 2.2 is stable and safe to keep running.
Can I use Qwen 2.2 commercially?
Yes. The 0.5B, 1.5B, 7B, 14B, 32B, and 57B-A14B variants are released under Apache 2.0, which allows unrestricted commercial use, modification, and re-distribution. The 72B variant uses the slightly more restrictive Qwen Research License (free for organizations with under 100M monthly active users).
What hardware do I need to run Qwen 2.2?
It depends on the size. Qwen2.2-0.5B runs on a smartphone or Raspberry Pi. Qwen2.2-7B runs on a single consumer GPU (RTX 3060 12 GB or better). Qwen2.2-72B needs ~145 GB of GPU memory โ€” typically 2ร— A100 80 GB or 1ร— H100. Quantized GGUF versions can drop memory requirements by 50โ€“75% with minor quality loss.
Should I upgrade from Qwen 2.2 to Qwen 3.6?
If you're building something new, yes โ€” Qwen 3.6 has meaningfully stronger reasoning, longer context, and native multimodal support. If you're running an existing fine-tuned Qwen 2.2 pipeline that works well, there's no urgency. The migration path is smooth (the tokenizer and chat template are compatible), but stability often outweighs marginal capability gains.
Can I fine-tune Qwen 2.2?
Absolutely. Qwen 2.2 has excellent ecosystem support โ€” LoRA, QLoRA, and full fine-tuning all work out-of-the-box with libraries like Axolotl, LLaMA-Factory, Unsloth, and PEFT. A 7B model can be LoRA-fine-tuned on a single 24 GB GPU in a few hours.
Does Qwen 2.2 support function calling and tool use?
Yes, the Instruct variants support function calling through a structured chat template. It's slightly less reliable than Qwen 3.x's native tool-use training, but works well for most use cases when you provide clear tool descriptions in the system prompt.
Where can I find quantized (GGUF) versions?
Quantized GGUF builds (q4_k_m, q5_k_m, q8_0) are available on the official Qwen Hugging Face organization and from community packagers like TheBloke and bartowski. For Ollama users, quantization is automatic when you run ollama pull qwen2.2:7b.
Is there an API for Qwen 2.2?
Yes. Qwen 2.2 is available through Alibaba Cloud's DashScope API, as well as via OpenRouter, Together AI, Fireworks, and many other inference providers. For most use cases, the newer Qwen 3.6 API offers better price-performance, but Qwen 2.2 endpoints remain available for legacy compatibility.

Get started with Qwen 2.2 today

Download the weights, run them anywhere, and ship without licensing headaches.

๐Ÿ“ฆ Browse on Hugging Face