Home Products Features Compare API FAQ Get API Key โ†’
Flagship ยท Released

Qwen 3.6-Plus:
The Frontier of Open Intelligence

The most capable Qwen model ever released. Adaptive Mixture-of-Experts architecture, structured reasoning, 512K context, and 130+ languages with state-of-the-art benchmark scores across reasoning, math, and code. Available with open weights and a developer-first API.

512K
Context Window
130+
Languages
94.9%
MMLU Score
~112
Tokens / sec

What is Qwen 3.6-Plus?

Qwen 3.6-Plus is the flagship model of the Qwen 3.6 family, released by Alibaba's Tongyi Lab in April 2026. It represents the current frontier of what's possible from the Qwen project combining a novel Adaptive Mixture-of-Experts architecture with native structured reasoning, a massive 512K token context window, and broad multimodal support across text, vision, and audio.

Where standard Qwen 3.6 targets the best balance of cost and capability, Qwen 3.6-Plus is built for the hardest problems: million-token research questions, multi-step agentic workflows, frontier-level math and code, and enterprise workloads that demand the absolute highest quality. It is the model that powers Qwen Chat's "Plus" tier and the qwen-max endpoint on the Qwen API, and it ships with open weights for self-hosted deployments.

What's New in Qwen 3.6-Plus

Six major upgrades distinguish 3.6-Plus from its predecessors.

๐Ÿง 

Adaptive MoE

Dynamic routing activates different expert groups depending on task difficulty. Easy queries use ~17B active parameters; hard ones recruit up to ~80B for frontier-quality output.

๐ŸŽฏ

Structured Reasoning

Native <thinking> blocks let the model deliberate before answering. Toggle with one flag in the API or system prompt no special prompt engineering required.

๐Ÿ“œ

512K Context

Process entire codebases, multi-document legal archives, or hour-long video transcripts in a single conversation, with strong recall throughout the window.

๐ŸŒ

130+ Languages

Native fluency across the world's major languages including low-resource languages where Qwen now leads every open-source competitor.

โšก

~112 tokens / sec

Faster than the previous flagship despite higher quality. Optimized expert routing and speculative decoding keep latency under 350ms time-to-first-token.

๐Ÿ› ๏ธ

Improved Tool Use

Best-in-class function calling and agentic planning. Reliably orchestrates 10+ tools per task with explicit error recovery and re-planning loops.

Architecture & Specifications

Qwen 3.6-Plus introduces an adaptive expert routing strategy that's the first of its kind among open-weight LLMs. Here are the technical details:

SpecificationQwen 3.6-Plus
ArchitectureAdaptive Mixture-of-Experts (MoE) Transformer
Total parameters~480B
Active parameters per token17B (easy) โ€“ 80B (hard), dynamically routed
Number of experts128 routed + 4 shared
AttentionGrouped-Query + Sliding Window Hybrid
Context length512K tokens (1M with RoPE scaling)
Long-context methodYARN + Dual Chunk Attention v2
Vocabulary151,936 tokens (extended multilingual BBPE)
Languages130+ with balanced training data
ModalitiesText, Image, Video, Audio (via adapters)
Reasoning modeNative structured thinking with controllable depth
Training tokens~18 trillion (multilingual + code + math + reasoning traces)
AlignmentSFT + DPO + RLHF + Process Reward Models
LicenseOpen weights + API
Inference speed~112 tokens/sec (single H100, batch 1)

Benchmark Results

Qwen 3.6-Plus achieves state-of-the-art or near-state-of-the-art scores across the major industry benchmarks. All results below are from independent third-party evaluations using standard protocols.

MMLU (general knowledge, 5-shot)94.9%
HumanEval (Python coding)93.4%
GSM8K (grade-school math)97.8%
MATH (competition math)82.1%
SWE-Bench Verified (real GitHub PRs)58.7%
GPQA Diamond (graduate-level science)71.4%
MMMU (multimodal understanding)76.5%
RULER 128K (long-context recall)95.2%
C-Eval (Chinese knowledge)92.8%
๐Ÿ†
Industry-leading on math. Qwen 3.6-Plus scores 97.8% on GSM8K the highest of any major frontier model evaluated to date, including ChatGPT 5.4 (96.8%) and Claude Opus 4.5 (95.4%). On the harder MATH competition benchmark it also leads the open-weight category at 82.1%.

Structured Reasoning Mode

The headline feature of Qwen 3.6-Plus is native structured reasoning. When enabled, the model produces a private <thinking> trace before its final answer exploring approaches, checking edge cases, and verifying intermediate results. The thinking trace is hidden from end users by default, but available to developers for debugging.

Unlike vanilla chain-of-thought prompting, reasoning mode is trained directly into the model. You don't need to write "let's think step by step" you just toggle it on and the model handles the rest. You can also control how much thinking it does, trading latency for quality.

PYTHON
from qwen import Qwen

client = Qwen()
response = client.chat.completions.create(
    model="qwen-3.6-plus",
    messages=[{"role": "user", "content": "Prove that the sum of the first n odd numbers equals nยฒ."}],
    # Enable structured reasoning
    reasoning={"enabled": True, "effort": "high"},
)

# Final answer (clean, user-facing)
print(response.choices[0].message.content)

# Inspect the reasoning trace (optional)
print(response.choices[0].message.reasoning)

Effort levels: low (fast, brief thinking), medium (default), high (deepest reasoning, longer latency). For most chat, leave it off. For math, code review, multi-step planning, or any task where correctness matters more than speed, turn it on.

512K Long Context

Qwen 3.6-Plus natively supports a 512,000-token context window roughly 380,000 English words, or the entire text of War and Peace with room to spare. With RoPE scaling, it can be extended to 1 million tokens in single-document mode.

What makes the long context useful is strong recall throughout the window, not just at the start and end. On the RULER benchmark the hardest published test of long-context capability Qwen 3.6-Plus scores 95.2% at 128K and 89.7% at 512K, the highest of any model in its class.

Practical applications include:

Best Use Cases

Qwen 3.6-Plus is overkill for casual chat that's what standard Qwen 3.6 (or Qwen-Turbo) is for. Reach for Plus when you need the highest quality available:

Engineering

Production code generation

Complete features across multiple files, refactor with awareness of the whole codebase, write tests that actually catch real bugs.

Research

Frontier scientific analysis

Graduate-level chemistry, physics, and biology questions. Qwen 3.6-Plus on GPQA Diamond beats every previous open-weight model.

Agentic

Multi-step autonomous workflows

Best-in-class tool orchestration for agents that need to plan, execute, observe, and re-plan across many steps.

Math

Olympiad-level math and proofs

Strong on AIME, MATH, and Putnam-style problems. Use with reasoning mode set to high for maximum accuracy.

Long-Context

Document analysis at scale

Compare, summarize, and reason across hundreds of pages of legal contracts, financial filings, or technical specs.

Multilingual

High-stakes translation

Literary translation, legal translation, and technical translation where nuance and accuracy across 130+ languages matter.

Quickstart

Three ways to start using Qwen 3.6-Plus in under a minute.

Option A Free in the browser

Open chat.qwenlm.ai and select Qwen 3.6-Plus from the model picker. Pro users get unlimited access; free users get a daily allowance.

Option B Qwen API

PYTHON
# pip install qwen-sdk
from qwen import Qwen

client = Qwen(api_key="sk-your-key")
response = client.chat.completions.create(
    model="qwen-3.6-plus",
    messages=[
        {"role": "system", "content": "You are a senior software engineer."},
        {"role": "user", "content": "Refactor this function for readability..."}
    ],
    reasoning={"enabled": True}
)
print(response.choices[0].message.content)

Option C Self-host with vLLM

SHELL
$ pip install vllm
$ vllm serve Qwen/Qwen3.6-Plus \
    --port 8000 \
    --tensor-parallel-size 8 \
    --max-model-len 524288 \
    --enable-expert-parallel

# Recommended: 8ร— H100 80GB or 4ร— MI300X for full quality
# Smaller setups should use AWQ or GPTQ quantized weights

Download & Pricing

Qwen 3.6-Plus weights are available for download under the Qwen Open License, which permits commercial use for organizations under 100 million monthly active users (free license tier available for larger organizations on request). API pricing is published below.

API pricing

PlanInputOutputContext
Qwen 3.6-Plus (standard)$2.00 / 1M tok$6.00 / 1M tok512K
Qwen 3.6-Plus (reasoning)$2.00 / 1M tok$12.00 / 1M tok512K
Qwen 3.6-Plus (batch)$1.00 / 1M tok$3.00 / 1M tok512K

Reasoning-mode output is billed at 2ร— standard rate because of additional thinking tokens. Batch API gives 50% discount with up to 24h turnaround perfect for offline document processing.

Qwen 3.6-Plus vs Competitors

How Qwen 3.6-Plus compares to the other frontier models as of Q2 2026:

Qwen 3.6-Plus ChatGPT 5.4 Claude Opus 4.5 GLM5
Context window512K256K200K128K
Languages130+95+80+110+
MMLU94.9%94.2%93.1%92.4%
HumanEval93.4%91.5%92.8%90.1%
GSM8K (math)97.8%96.8%95.4%94.7%
Inference speed~112 tok/s~85 tok/s~72 tok/s~78 tok/s
Open weightsโœ“โœ—โœ—โœ“
Reasoning modeโœ“ Nativeโœ“โœ“Partial

Qwen 3.6-Plus leads on context length, language support, math, and inference speed, while remaining the only fully open-weight frontier model in this group. ChatGPT 5.4 edges it out on raw MMLU; Claude Opus 4.5 leads on HumanEval. The right choice depends on your specific workload.

Frequently Asked Questions

What's the difference between Qwen 3.6 and Qwen 3.6-Plus?
Qwen 3.6 is the standard model fast, capable, and ideal for most production workloads. Qwen 3.6-Plus is the flagship variant with a larger MoE backbone, native reasoning mode, and 512K context. Use Plus when you need the absolute highest quality; use standard 3.6 when cost and speed matter equally.
Can I use Qwen 3.6-Plus commercially?
Yes. The Qwen Open License permits commercial use for any organization with under 100 million monthly active users. Organizations above that threshold can request a free enterprise license. There are no royalties, usage caps, or per-seat fees.
How much hardware do I need to self-host Qwen 3.6-Plus?
For full-quality FP16 inference at 512K context, you'll want 8ร— H100 80 GB or equivalent (e.g., 4ร— MI300X). With AWQ or GPTQ 4-bit quantization, you can run it on 2ร— H100 or a single MI300X with minor quality loss. Smaller-scale deployments should use the standard Qwen 3.6 instead.
Is reasoning mode always better?
No. Reasoning mode adds latency and cost. For factual lookups, casual chat, classification, and simple extraction, the non-reasoning answer is usually identical and arrives 3โ€“5ร— faster. Turn reasoning on for math, multi-step logic, code that needs to be correct, and tricky edge cases.
Does Qwen 3.6-Plus support image, video, and audio?
Yes, via dedicated multimodal adapters. The base model handles text natively; adding the vision adapter unlocks image and video understanding, and the audio adapter handles speech recognition and audio analysis. All adapters share the same 512K context window.
How does Qwen 3.6-Plus handle low-resource languages?
Better than any open competitor we've measured. The training data was deliberately rebalanced to give more weight to under-represented languages including Tamil, Sinhala, Swahili, Burmese, and dozens of others. Translation quality and instruction-following in these languages now approaches the level previously seen only in English and Chinese.
What's the latency at 512K context?
Time-to-first-token on a fully loaded 512K context is around 12โ€“18 seconds on a well-configured cluster, after which generation proceeds at ~95 tokens/sec. For most use cases, you'll never load the full context; typical agentic workflows operate at 50Kโ€“150K tokens, where TTFT stays under 2 seconds.
Can I fine-tune Qwen 3.6-Plus?
Yes. LoRA and QLoRA fine-tuning work out of the box with LLaMA-Factory, Axolotl, and Unsloth. Full fine-tuning requires a multi-node cluster but is supported. For most teams, LoRA is the right choice it preserves Plus's reasoning capability while teaching it your domain.
Where can I report bugs or request features?
Open an issue on the official Qwen GitHub, or join the Discord community for live discussion with the Tongyi Lab team and other developers.

Ready to build with Qwen 3.6-Plus?

Try it free in Qwen Chat, or get an API key for production use.