๐ On this page
What is Qwen3.7-Max?
Alibaba's most advanced agent model to date - a closed-weight, API-only reasoning flagship built for long-horizon autonomous work rather than single-shot chat.
Qwen3.7-Max is the top tier of the Qwen 3.7 family, announced at the 2026 Alibaba Cloud Summit in Hangzhou on May 20, 2026. The Qwen team describes it as a "versatile agent foundation" rather than a general-purpose chatbot. In practice that means it is tuned to run hundreds or even thousands of steps autonomously: writing code, calling tools, checking its own work, and carrying a task forward across hours. In one internal demonstration it ran for roughly 35 hours and made over 1,000 tool calls to optimize a GPU compute kernel - though that figure is an unverified vendor claim, so treat it as a statement of design intent rather than a guarantee.
A few characteristics shape every decision you'll make when setting it up:
- Text-only. Qwen3.7-Max accepts text input and produces text output - there is no image or video input. If you need vision, use its sibling Qwen3.7-Plus instead.
- Closed-weight, API-only. Unlike many earlier Qwen models, there is no open-weight, downloadable release for the 3.7 generation. You access it through Alibaba Cloud Model Studio (DashScope), not by self-hosting.
- A reasoning model. It generates an internal chain of thought before answering, which you can toggle on or off. This makes it stronger on hard problems but more verbose and more expensive per request.
- Dual API compatibility. It speaks both the OpenAI and Anthropic API specifications, so it drops into most existing pipelines - including Claude Code - with minimal changes.
Key features at a glance
Before the setup steps, it helps to know what you're actually getting. These are the capabilities that distinguish Qwen3.7-Max from both its predecessor and the wider field, and they explain why the install path looks the way it does.
๐ง Extended-thinking reasoning
Plans, checks its work, and self-corrects before answering. Toggleable on or off - the single biggest lever over quality and cost.
๐ 1M-token context
Holds a full mid-sized repository or a large document stack in one request, up from 256K on Qwen3.6-Max.
๐ค Long-horizon agents
Built for sustained autonomous loops - hundreds of tool calls across a single task, not one-shot answers.
๐ Cross-harness generalization
Performs consistently in Claude Code, OpenClaw, Qwen Code, or a custom framework - a drop-in backbone, no scaffold-specific tuning.
๐งฉ Native MCP & tool use
Standard function-calling plus Model Context Protocol support, with strong MCP-Atlas (76.4) and MCP-Mark (60.8) scores.
๐ก๏ธ Low hallucination
Posts the lowest hallucination rate in the frontier tier - partly by abstaining more often on uncertain factual questions.
On the Artificial Analysis Intelligence Index v4.0 it scored 56.6, placing it fifth overall - ahead of Gemini 3.5 Flash and behind GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro. The gains over Qwen 3.6 are concentrated in exactly the areas the team says they trained for: scientific reasoning, agentic capability, and coding.
Before you begin
Because Qwen3.7-Max is a hosted model, there's no GPU, no model download, and no heavy local install. The requirements are modest:
For chat use
Just a browser and a free Qwen account at chat.qwen.ai. No API key, no setup, no payment method.
For API use
An Alibaba Cloud account, a Model Studio (DashScope) API key, and either Python 3.8+ or Node.js 18+. Internet access to the DashScope endpoint.
For agents / Claude Code
The harness of your choice (Claude Code, OpenClaw, or a custom framework) plus a DashScope key. Qwen3.7-Max plugs into the Anthropic-compatible endpoint.
What you do NOT need
No local GPU, no VRAM, no Hugging Face download, no Docker image - there are no open weights for 3.7 to host yourself.
Throughout this guide the model identifier is always the same string:
qwen3.7-max
Use Qwen3.7-Max in Qwen Chat
The no-code path - start here to evaluate quality before wiring anything into production.
The fastest way to try the model requires no API key and no code at all. This is ideal for kicking the tires on real prompts before you spend a token through the API.
- Open Qwen ChatNavigate to chat.qwen.ai and create a free account (or sign in). The web app works on desktop and mobile browsers.
- Select the modelOpen the model selector dropdown and choose Qwen3.7-Max. During the preview period it may appear as Qwen3.7-Max-Preview.
- Turn on Thinking ModeToggle Thinking Mode in the chat interface. This activates the chain-of-thought reasoning layer and lets you watch the model's reasoning trace before its final answer - useful for understanding how it approaches a problem.
- Send a hard promptUse your most demanding real-world prompts: multi-step math, complex refactors, ambiguous expert questions. Trivial prompts reveal little about a frontier model's edge and waste the reasoning overhead.
Install & call the API
The developer path - served through Alibaba Cloud Model Studio (DashScope).
For anything programmatic, Qwen3.7-Max is served through Alibaba Cloud Model Studio. It exposes both an OpenAI-compatible endpoint and an Anthropic-compatible endpoint, so you rarely need a Qwen-specific SDK - your existing OpenAI or Anthropic client usually works with just a base-URL and key change.
Step 1 - Get your API key
- Create an Alibaba Cloud accountSign up at the Alibaba Cloud console if you don't already have one.
- Open Model StudioGo to the Model Studio console (modelstudio.console.alibabacloud.com for international, or the Singapore region for non-Mainland access).
- Generate a DashScope API keyFrom the dashboard, create an API key. Treat it like a password - never commit it to source control.
- Export it as an environment variableSo your code never hardcodes the secret.
# macOS / Linux export DASHSCOPE_API_KEY="sk-your-key-here" # Windows (PowerShell) $env:DASHSCOPE_API_KEY="sk-your-key-here"
Step 2 - Make your first call
Pick the language and client you already use. All four tabs below do the same thing: send one prompt to qwen3.7-max and print the reply.
Install the OpenAI SDK, then point it at the DashScope compatible-mode endpoint:
pip install openaiimport os from openai import OpenAI client = OpenAI( api_key=os.environ["DASHSCOPE_API_KEY"], base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1", ) resp = client.chat.completions.create( model="qwen3.7-max", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Refactor this loop and explain why."}, ], ) print(resp.choices[0].message.content)
The OpenAI JS client works the same way - Node 18+, Bun, or Deno:
npm install openaiimport OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.DASHSCOPE_API_KEY, baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1", }); const resp = await client.chat.completions.create({ model: "qwen3.7-max", messages: [{ role: "user", content: "Write a binary search in Rust." }], }); console.log(resp.choices[0].message.content);
No SDK needed - a plain HTTPS request works from any environment:
curl https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen3.7-max", "messages": [{"role": "user", "content": "Explain MoE routing."}] }'
Because Qwen3.7-Max also speaks the Anthropic protocol, you can use the Anthropic SDK by pointing it at the DashScope Anthropic-compatible endpoint. This is what makes Claude Code work (see section 9).
import os, anthropic client = anthropic.Anthropic( api_key=os.environ["DASHSCOPE_API_KEY"], base_url="https://dashscope-intl.aliyuncs.com/apps/anthropic", ) msg = client.messages.create( model="qwen3.7-max", max_tokens=2048, messages=[{"role": "user", "content": "Plan a 5-step refactor."}], ) print(msg.content)
Step 3 - Read the response correctly
Qwen3.7-Max is verbose by design - in one independent evaluation it generated about 97 million tokens where the median model produced roughly 24 million. When you write tests or parse output, assert on the final answer, not on the exact wording of the reasoning, which varies between runs. Cap your output length deliberately; the model supports up to a 64K-token maximum output, which is generous but billable.
Using thinking mode
The single most important cost-and-quality lever you control.
Thinking mode is the model's chain-of-thought layer: before producing a final answer it plans, checks its work, and corrects course. It's the source of Qwen3.7-Max's strength on hard problems - and the source of its token cost.
Turning it on via the API
On the OpenAI-compatible endpoint, enable reasoning with an extra_body flag:
resp = client.chat.completions.create( model="qwen3.7-max", messages=[{"role": "user", "content": "Prove the statement step by step."}], extra_body={"enable_thinking": True}, )
Preserving thinking across turns
For multi-turn agent conversations, Qwen3.7-Max supports a preserve_thinking feature that retains the reasoning content from all preceding turns. This helps the model maintain a coherent line of reasoning across a long session - at the cost of carrying more tokens forward in context.
resp = client.chat.completions.create( model="qwen3.7-max", messages=conversation_history, extra_body={ "enable_thinking": True, "preserve_thinking": True, # keep prior reasoning in context }, )
โ Turn thinking ON for
Multi-step code refactors, math proofs, long agent task chains, ambiguous problems needing step-by-step planning, and anything where correctness matters more than speed.
โ Turn thinking OFF for
Short rewrites, simple classifications, quick lookups, and high-volume tasks where latency and per-token cost need to be minimized.
Tool calling & MCP
Tool use is where Qwen3.7-Max is meant to live. It supports function calling natively, in the standard OpenAI tools format, and it's designed to chain many tool invocations across a long task loop rather than calling one tool and stopping.
Defining tools
tools = [{
"type": "function",
"function": {
"name": "run_tests",
"description": "Run the project test suite and return results.",
"parameters": {
"type": "object",
"properties": {"path": {"type": "string"}},
"required": ["path"],
},
},
}]
resp = client.chat.completions.create(
model="qwen3.7-max",
messages=messages,
tools=tools,
)Model Context Protocol (MCP)
Qwen3.7-Max supports native integration with the Model Context Protocol, the standardized way to connect a model to external tools and data sources. On MCP-Atlas it scored 76.4 and on MCP-Mark 60.8 - both at or above several leading frontier models - which is a meaningful signal if you're building MCP-based agents. Expose your tools through an MCP server and connect Qwen3.7-Max as the reasoning backbone; the model handles the multi-step tool orchestration.
Streaming & the 1M context
Streaming responses
For chat UIs and long generations, stream tokens as they arrive instead of waiting for the full response:
stream = client.chat.completions.create( model="qwen3.7-max", messages=[{"role": "user", "content": "Summarize this repo."}], stream=True, ) for chunk in stream: delta = chunk.choices[0].delta.content or "" print(delta, end="", flush=True)
Working with the million-token window
The 1M-token context is large enough to hold a full mid-sized code repository, a big stack of documents, or an entire long agent history in one request. For agentic work, pass the full task history, prior tool outputs, and current code state into context so the model reasons over the complete picture. Two cautions apply:
- Trim aggressively. Every token in context is billed on input. Don't keep history you don't need - a 1M window is a ceiling, not a target.
- A ceiling isn't a guarantee. Models often reason less reliably as the window fills, and independent long-context testing for Qwen3.7-Max isn't yet available. If your use case depends on retrieving a detail buried deep in a huge context, test that retrieval on your own data.
Using Qwen3.7-Max with Claude Code & harnesses
One of the model's headline features is that it acts as a drop-in intelligence layer for diverse agent frameworks rather than being locked to one interface. Because it supports the Anthropic API protocol natively, you can plug it into tools built for that protocol - including Claude Code and OpenClaw - by pointing them at the DashScope Anthropic-compatible endpoint and supplying your DashScope key.
- Point the harness at the DashScope Anthropic endpointMost Anthropic-protocol tools let you override the base URL via an environment variable.
- Supply your DashScope keyUse your Model Studio key in place of the harness's usual provider key.
- Set the model to qwen3.7-maxConfigure the harness to request the qwen3.7-max model.
# Redirect an Anthropic-protocol harness to Qwen3.7-Max export ANTHROPIC_BASE_URL="https://dashscope-intl.aliyuncs.com/apps/anthropic" export ANTHROPIC_API_KEY="$DASHSCOPE_API_KEY" export ANTHROPIC_MODEL="qwen3.7-max"
Pricing & cost control
Qwen3.7-Max is positioned as a flagship reasoning engine - priced below the most expensive Western frontier models, but well above commodity tiers. There is no free tier on the model itself beyond Model Studio's new-user quota.
| Item | Qwen3.7-Max | Qwen3.6-Max-Preview |
|---|---|---|
| Input price | $2.50 / 1M | $1.30 / 1M |
| Output price | $7.50 / 1M | $7.80 / 1M |
| Context window | 1,000,000 | 256,000 |
| Max output | 64,000 | - |
| Open weights | No (API-only) | Yes (Apache 2.0) |
Three ways to keep the bill down
- Gate thinking mode. It's the biggest cost driver. Only enable it for genuinely hard, multi-step tasks.
- Trim context. Output is billed at $7.50/1M and the model is verbose, so cap max_tokens and prune history. Output cost usually dominates the bill.
- Right-size the model. For short, high-volume, latency-sensitive calls, a cheaper Qwen tier (or Qwen 3.6) is often the better economic choice. Reserve Max for work that needs its depth.
Qwen3.7-Max vs Qwen 3.6: should you upgrade?
If you're already on Qwen 3.6, here's the honest decision framework. The two generations serve overlapping but distinct needs.
| Dimension | Qwen3.7-Max | Qwen 3.6 line |
|---|---|---|
| Status | Preview | Generally available |
| Weights | Closed, API-only | Open (Apache 2.0) |
| Context | 1M tokens | 256K tokens |
| Modality | Text only | Multimodal |
| Intelligence Index | 56.6 (#5) | 51.8 (Max) |
| Self-host | No | Yes |
Upgrade to 3.7-Max if you need the strongest reasoning and agentic coding Alibaba offers, you work in text, and you're comfortable with a closed, preview-stage, API-only model. Stay on 3.6 if you need stable availability, open weights for self-hosting, multimodal input, or lower per-token cost for high-volume work. Many teams run both: 3.6 for bulk and on-prem, 3.7-Max for the hardest agent tasks.
Can you run Qwen3.7-Max locally?
Short answer: no, not as of mid-2026. The 3.7 generation is closed-weight - there are no Qwen3.7 weights on Hugging Face or ModelScope, and no QwenLM/Qwen3.7 repository on GitHub. Tools like Ollama, llama.cpp, and Hugging Face Transformers cannot run a model whose weights were never released.
If local or self-hosted inference is a hard requirement, you have two realistic options:
Use Qwen 3.6 open weights
Qwen3.6-27B and Qwen3.6-35B-A3B ship under Apache 2.0 and are downloadable and self-hostable today via Ollama, llama.cpp, or Transformers - a strong open alternative for on-prem needs.
Wait for a possible open mid-tier
Alibaba has historically released open mid-tier models after a flagship preview. An open-weight Qwen 3.7 variant may follow, but nothing is confirmed - don't plan around it.
# There is no qwen3.7 local model. For self-hosting, use 3.6:
ollama pull qwen3.6:27b
ollama run qwen3.6:27bTroubleshooting
| Symptom | Likely cause & fix |
|---|---|
| 401 / auth error | Key missing, wrong, or for the wrong region. Re-export DASHSCOPE_API_KEY and confirm the matching regional endpoint (international vs. Mainland). |
| 404 on endpoint | Wrong base URL - especially the Anthropic path, which can shift during preview. Verify the current compatible-mode URL in the docs. |
| Model not found | Check the model string is exactly qwen3.7-max. During preview it may be region-gated or need enabling in the console. |
| Very long / slow replies | Thinking mode is on. Disable it for simple tasks and set a sensible max_tokens cap. |
| Bill higher than expected | Verbose reasoning output. Gate thinking, trim context, set budget alerts. Output at $7.50/1M dominates cost. |
| Says "I don't know" | Expected - it abstains more to cut hallucination. Use retrieval or another model for pure factual recall. |
| Flaky output tests | You're asserting on the reasoning trace, which varies run to run. Assert on the final answer only. |