Qwen3.7-Max - Advanced AI Model, Features, API & Benchmarks

📑 On this page

1. What is Qwen3.7-Max 2. Key features 3. Before you begin 4. Use it in Qwen Chat 5. Install & call the API 6. Thinking mode 7. Tool calling & MCP 8. Streaming & 1M context 9. Claude Code & harnesses 10. Pricing & cost control 11. 3.7-Max vs 3.6 12. Running locally? 13. Troubleshooting 14. FAQ

What is Qwen3.7-Max?

Alibaba's most advanced agent model to date - a closed-weight, API-only reasoning flagship built for long-horizon autonomous work rather than single-shot chat.

Qwen3.7-Max is the top tier of the Qwen 3.7 family, announced at the 2026 Alibaba Cloud Summit in Hangzhou on May 20, 2026. The Qwen team describes it as a "versatile agent foundation" rather than a general-purpose chatbot. In practice that means it is tuned to run hundreds or even thousands of steps autonomously: writing code, calling tools, checking its own work, and carrying a task forward across hours. In one internal demonstration it ran for roughly 35 hours and made over 1,000 tool calls to optimize a GPU compute kernel - though that figure is an unverified vendor claim, so treat it as a statement of design intent rather than a guarantee.

A few characteristics shape every decision you'll make when setting it up:

Text-only. Qwen3.7-Max accepts text input and produces text output - there is no image or video input. If you need vision, use its sibling Qwen3.7-Plus instead.
Closed-weight, API-only. Unlike many earlier Qwen models, there is no open-weight, downloadable release for the 3.7 generation. You access it through Alibaba Cloud Model Studio (DashScope), not by self-hosting.
A reasoning model. It generates an internal chain of thought before answering, which you can toggle on or off. This makes it stronger on hard problems but more verbose and more expensive per request.
Dual API compatibility. It speaks both the OpenAI and Anthropic API specifications, so it drops into most existing pipelines - including Claude Code - with minimal changes.

Preview status. Qwen3.7-Max launched as a preview. Benchmark scores, behavior, and even pricing can change before a stable release, and a deliberate "I don't know" abstention behavior (covered later) means it answers fewer broad-knowledge questions on purpose. Validate it against your own workload before committing a production pipeline to it.

Key features at a glance

Before the setup steps, it helps to know what you're actually getting. These are the capabilities that distinguish Qwen3.7-Max from both its predecessor and the wider field, and they explain why the install path looks the way it does.

🧠 Extended-thinking reasoning

Plans, checks its work, and self-corrects before answering. Toggleable on or off - the single biggest lever over quality and cost.

📚 1M-token context

Holds a full mid-sized repository or a large document stack in one request, up from 256K on Qwen3.6-Max.

🤖 Long-horizon agents

Built for sustained autonomous loops - hundreds of tool calls across a single task, not one-shot answers.

🔗 Cross-harness generalization

Performs consistently in Claude Code, OpenClaw, Qwen Code, or a custom framework - a drop-in backbone, no scaffold-specific tuning.

🧩 Native MCP & tool use

Standard function-calling plus Model Context Protocol support, with strong MCP-Atlas (76.4) and MCP-Mark (60.8) scores.

🛡️ Low hallucination

Posts the lowest hallucination rate in the frontier tier - partly by abstaining more often on uncertain factual questions.

On the Artificial Analysis Intelligence Index v4.0 it scored 56.6, placing it fifth overall - ahead of Gemini 3.5 Flash and behind GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro. The gains over Qwen 3.6 are concentrated in exactly the areas the team says they trained for: scientific reasoning, agentic capability, and coding.

Before you begin

Because Qwen3.7-Max is a hosted model, there's no GPU, no model download, and no heavy local install. The requirements are modest:

For chat use

Just a browser and a free Qwen account at chat.qwen.ai. No API key, no setup, no payment method.

For API use

An Alibaba Cloud account, a Model Studio (DashScope) API key, and either Python 3.8+ or Node.js 18+. Internet access to the DashScope endpoint.

For agents / Claude Code

The harness of your choice (Claude Code, OpenClaw, or a custom framework) plus a DashScope key. Qwen3.7-Max plugs into the Anthropic-compatible endpoint.

What you do NOT need

No local GPU, no VRAM, no Hugging Face download, no Docker image - there are no open weights for 3.7 to host yourself.

Throughout this guide the model identifier is always the same string:

model id

qwen3.7-max

Use Qwen3.7-Max in Qwen Chat

The no-code path - start here to evaluate quality before wiring anything into production.

The fastest way to try the model requires no API key and no code at all. This is ideal for kicking the tires on real prompts before you spend a token through the API.

Open Qwen ChatNavigate to chat.qwen.ai and create a free account (or sign in). The web app works on desktop and mobile browsers.
Select the modelOpen the model selector dropdown and choose Qwen3.7-Max. During the preview period it may appear as Qwen3.7-Max-Preview.
Turn on Thinking ModeToggle Thinking Mode in the chat interface. This activates the chain-of-thought reasoning layer and lets you watch the model's reasoning trace before its final answer - useful for understanding how it approaches a problem.
Send a hard promptUse your most demanding real-world prompts: multi-step math, complex refactors, ambiguous expert questions. Trivial prompts reveal little about a frontier model's edge and waste the reasoning overhead.

Evaluation tip: Because Qwen3.7-Max is an agent-first model, it shines on tasks with structure - planning, decomposition, verification loops. If your test prompt is a one-line factual lookup, you're testing the wrong thing. Give it something it has to work through.

Install & call the API

The developer path - served through Alibaba Cloud Model Studio (DashScope).

For anything programmatic, Qwen3.7-Max is served through Alibaba Cloud Model Studio. It exposes both an OpenAI-compatible endpoint and an Anthropic-compatible endpoint, so you rarely need a Qwen-specific SDK - your existing OpenAI or Anthropic client usually works with just a base-URL and key change.

Step 1 - Get your API key

Create an Alibaba Cloud accountSign up at the Alibaba Cloud console if you don't already have one.
Open Model StudioGo to the Model Studio console (modelstudio.console.alibabacloud.com for international, or the Singapore region for non-Mainland access).
Generate a DashScope API keyFrom the dashboard, create an API key. Treat it like a password - never commit it to source control.
Export it as an environment variableSo your code never hardcodes the secret.

terminal

# macOS / Linux
export DASHSCOPE_API_KEY="sk-your-key-here"

# Windows (PowerShell)
$env:DASHSCOPE_API_KEY="sk-your-key-here"

Free quota: New Model Studio users typically get a block of free tokens per proprietary model in the Singapore region - enough to evaluate Qwen3.7-Max before any spend. Enable the "free quota only" toggle in the console if you want the service to stop rather than switch to pay-as-you-go when the quota runs out.

Step 2 - Make your first call

Pick the language and client you already use. All four tabs below do the same thing: send one prompt to qwen3.7-max and print the reply.

Install the OpenAI SDK, then point it at the DashScope compatible-mode endpoint:

install

pip install openai

first_call.py

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DASHSCOPE_API_KEY"],
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

resp = client.chat.completions.create(
    model="qwen3.7-max",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Refactor this loop and explain why."},
    ],
)
print(resp.choices[0].message.content)

The OpenAI JS client works the same way - Node 18+, Bun, or Deno:

install

npm install openai

first_call.mjs

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.DASHSCOPE_API_KEY,
  baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
});

const resp = await client.chat.completions.create({
  model: "qwen3.7-max",
  messages: [{ role: "user", content: "Write a binary search in Rust." }],
});
console.log(resp.choices[0].message.content);

No SDK needed - a plain HTTPS request works from any environment:

terminal

curl https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.7-max",
    "messages": [{"role": "user", "content": "Explain MoE routing."}]
  }'

Because Qwen3.7-Max also speaks the Anthropic protocol, you can use the Anthropic SDK by pointing it at the DashScope Anthropic-compatible endpoint. This is what makes Claude Code work (see section 9).

anthropic_style.py

import os, anthropic

client = anthropic.Anthropic(
    api_key=os.environ["DASHSCOPE_API_KEY"],
    base_url="https://dashscope-intl.aliyuncs.com/apps/anthropic",
)

msg = client.messages.create(
    model="qwen3.7-max",
    max_tokens=2048,
    messages=[{"role": "user", "content": "Plan a 5-step refactor."}],
)
print(msg.content)

Check the exact path. The Anthropic-compatible base URL can differ by region and may change during preview. If a call 404s, confirm the current endpoint in the Model Studio docs for your region before debugging your code.

Step 3 - Read the response correctly

Qwen3.7-Max is verbose by design - in one independent evaluation it generated about 97 million tokens where the median model produced roughly 24 million. When you write tests or parse output, assert on the final answer, not on the exact wording of the reasoning, which varies between runs. Cap your output length deliberately; the model supports up to a 64K-token maximum output, which is generous but billable.

Using thinking mode

The single most important cost-and-quality lever you control.

Thinking mode is the model's chain-of-thought layer: before producing a final answer it plans, checks its work, and corrects course. It's the source of Qwen3.7-Max's strength on hard problems - and the source of its token cost.

Turning it on via the API

On the OpenAI-compatible endpoint, enable reasoning with an extra_body flag:

thinking.py

resp = client.chat.completions.create(
    model="qwen3.7-max",
    messages=[{"role": "user", "content": "Prove the statement step by step."}],
    extra_body={"enable_thinking": True},
)

Preserving thinking across turns

For multi-turn agent conversations, Qwen3.7-Max supports a preserve_thinking feature that retains the reasoning content from all preceding turns. This helps the model maintain a coherent line of reasoning across a long session - at the cost of carrying more tokens forward in context.

multi_turn.py

resp = client.chat.completions.create(
    model="qwen3.7-max",
    messages=conversation_history,
    extra_body={
        "enable_thinking": True,
        "preserve_thinking": True,   # keep prior reasoning in context
    },
)

✅ Turn thinking ON for

Multi-step code refactors, math proofs, long agent task chains, ambiguous problems needing step-by-step planning, and anything where correctness matters more than speed.

⛔ Turn thinking OFF for

Short rewrites, simple classifications, quick lookups, and high-volume tasks where latency and per-token cost need to be minimized.

The verbosity tax. Each thinking token adds to latency and cost. For long agentic sessions the effective bill can be far higher than the headline per-token rate implies, because the model emits so many reasoning tokens. Use thinking selectively, and monitor token usage in the Model Studio dashboard.

Tool calling & MCP

Tool use is where Qwen3.7-Max is meant to live. It supports function calling natively, in the standard OpenAI tools format, and it's designed to chain many tool invocations across a long task loop rather than calling one tool and stopping.

Defining tools

tools.py

tools = [{
    "type": "function",
    "function": {
        "name": "run_tests",
        "description": "Run the project test suite and return results.",
        "parameters": {
            "type": "object",
            "properties": {"path": {"type": "string"}},
            "required": ["path"],
        },
    },
}]

resp = client.chat.completions.create(
    model="qwen3.7-max",
    messages=messages,
    tools=tools,
)

Model Context Protocol (MCP)

Qwen3.7-Max supports native integration with the Model Context Protocol, the standardized way to connect a model to external tools and data sources. On MCP-Atlas it scored 76.4 and on MCP-Mark 60.8 - both at or above several leading frontier models - which is a meaningful signal if you're building MCP-based agents. Expose your tools through an MCP server and connect Qwen3.7-Max as the reasoning backbone; the model handles the multi-step tool orchestration.

Cross-harness generalization. Unlike models tuned to perform best inside one proprietary scaffold, Qwen3.7-Max was trained to decouple the task, the harness, and the verifier - so it performs consistently whether you run it through Claude Code, OpenClaw, Qwen Code, or your own framework. You can adopt it as a drop-in backbone without framework-specific tuning.

Streaming & the 1M context

Streaming responses

For chat UIs and long generations, stream tokens as they arrive instead of waiting for the full response:

stream.py

stream = client.chat.completions.create(
    model="qwen3.7-max",
    messages=[{"role": "user", "content": "Summarize this repo."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Working with the million-token window

The 1M-token context is large enough to hold a full mid-sized code repository, a big stack of documents, or an entire long agent history in one request. For agentic work, pass the full task history, prior tool outputs, and current code state into context so the model reasons over the complete picture. Two cautions apply:

Trim aggressively. Every token in context is billed on input. Don't keep history you don't need - a 1M window is a ceiling, not a target.
A ceiling isn't a guarantee. Models often reason less reliably as the window fills, and independent long-context testing for Qwen3.7-Max isn't yet available. If your use case depends on retrieving a detail buried deep in a huge context, test that retrieval on your own data.

Using Qwen3.7-Max with Claude Code & harnesses

One of the model's headline features is that it acts as a drop-in intelligence layer for diverse agent frameworks rather than being locked to one interface. Because it supports the Anthropic API protocol natively, you can plug it into tools built for that protocol - including Claude Code and OpenClaw - by pointing them at the DashScope Anthropic-compatible endpoint and supplying your DashScope key.

Point the harness at the DashScope Anthropic endpointMost Anthropic-protocol tools let you override the base URL via an environment variable.
Supply your DashScope keyUse your Model Studio key in place of the harness's usual provider key.
Set the model to qwen3.7-maxConfigure the harness to request the qwen3.7-max model.

env (illustrative)

# Redirect an Anthropic-protocol harness to Qwen3.7-Max
export ANTHROPIC_BASE_URL="https://dashscope-intl.aliyuncs.com/apps/anthropic"
export ANTHROPIC_API_KEY="$DASHSCOPE_API_KEY"
export ANTHROPIC_MODEL="qwen3.7-max"

Verify variable names and the endpoint. Exact environment-variable names depend on the harness version, and the DashScope Anthropic path can change during preview. Always confirm against the current Model Studio docs and your harness's configuration guide - the example above shows the shape of the setup, not guaranteed literal values.

Pricing & cost control

Qwen3.7-Max is positioned as a flagship reasoning engine - priced below the most expensive Western frontier models, but well above commodity tiers. There is no free tier on the model itself beyond Model Studio's new-user quota.

Item	Qwen3.7-Max	Qwen3.6-Max-Preview
Input price	$2.50 / 1M	$1.30 / 1M
Output price	$7.50 / 1M	$7.80 / 1M
Context window	1,000,000	256,000
Max output	64,000	-
Open weights	No (API-only)	Yes (Apache 2.0)

Three ways to keep the bill down

Gate thinking mode. It's the biggest cost driver. Only enable it for genuinely hard, multi-step tasks.
Trim context. Output is billed at $7.50/1M and the model is verbose, so cap max_tokens and prune history. Output cost usually dominates the bill.
Right-size the model. For short, high-volume, latency-sensitive calls, a cheaper Qwen tier (or Qwen 3.6) is often the better economic choice. Reserve Max for work that needs its depth.

Watch real usage, not headline rates. Because a single long agentic session can emit tens of millions of tokens, effective cost can be much higher than the per-token price suggests. Monitor the Model Studio usage dashboard during your first production runs and set budget alerts.

Qwen3.7-Max vs Qwen 3.6: should you upgrade?

If you're already on Qwen 3.6, here's the honest decision framework. The two generations serve overlapping but distinct needs.

Dimension	Qwen3.7-Max	Qwen 3.6 line
Status	Preview	Generally available
Weights	Closed, API-only	Open (Apache 2.0)
Context	1M tokens	256K tokens
Modality	Text only	Multimodal
Intelligence Index	56.6 (#5)	51.8 (Max)
Self-host	No	Yes

Upgrade to 3.7-Max if you need the strongest reasoning and agentic coding Alibaba offers, you work in text, and you're comfortable with a closed, preview-stage, API-only model. Stay on 3.6 if you need stable availability, open weights for self-hosting, multimodal input, or lower per-token cost for high-volume work. Many teams run both: 3.6 for bulk and on-prem, 3.7-Max for the hardest agent tasks.

Can you run Qwen3.7-Max locally?

Short answer: no, not as of mid-2026. The 3.7 generation is closed-weight - there are no Qwen3.7 weights on Hugging Face or ModelScope, and no QwenLM/Qwen3.7 repository on GitHub. Tools like Ollama, llama.cpp, and Hugging Face Transformers cannot run a model whose weights were never released.

If local or self-hosted inference is a hard requirement, you have two realistic options:

Use Qwen 3.6 open weights

Qwen3.6-27B and Qwen3.6-35B-A3B ship under Apache 2.0 and are downloadable and self-hostable today via Ollama, llama.cpp, or Transformers - a strong open alternative for on-prem needs.

Wait for a possible open mid-tier

Alibaba has historically released open mid-tier models after a flagship preview. An open-weight Qwen 3.7 variant may follow, but nothing is confirmed - don't plan around it.

local alternative - Qwen 3.6 via Ollama

# There is no qwen3.7 local model. For self-hosting, use 3.6:
ollama pull qwen3.6:27b
ollama run qwen3.6:27b

Troubleshooting

Symptom	Likely cause & fix
401 / auth error	Key missing, wrong, or for the wrong region. Re-export DASHSCOPE_API_KEY and confirm the matching regional endpoint (international vs. Mainland).
404 on endpoint	Wrong base URL - especially the Anthropic path, which can shift during preview. Verify the current compatible-mode URL in the docs.
Model not found	Check the model string is exactly qwen3.7-max. During preview it may be region-gated or need enabling in the console.
Very long / slow replies	Thinking mode is on. Disable it for simple tasks and set a sensible max_tokens cap.
Bill higher than expected	Verbose reasoning output. Gate thinking, trim context, set budget alerts. Output at $7.50/1M dominates cost.
Says "I don't know"	Expected - it abstains more to cut hallucination. Use retrieval or another model for pure factual recall.
Flaky output tests	You're asserting on the reasoning trace, which varies run to run. Assert on the final answer only.

Frequently asked questions

What is the model ID for Qwen3.7-Max?

It's qwen3.7-max, used identically across the OpenAI-compatible and Anthropic-compatible endpoints on Alibaba Cloud Model Studio (DashScope).

Is there a free way to try it?

Yes - Qwen Chat (chat.qwen.ai) is free and needs no API key or code. For the API, Model Studio gives new users a free token quota per proprietary model in the Singapore region, enough for evaluation.

Does it support images or vision?

No. Qwen3.7-Max is text-in, text-out only. For vision and multimodal input, use Qwen3.7-Plus, the multimodal sibling in the 3.7 family.

How much does it cost?

$2.50 per million input tokens and $7.50 per million output tokens on Alibaba Cloud Model Studio. Because the model is verbose in thinking mode, effective cost on long sessions can be considerably higher than those headline rates.

Can I use it with Claude Code?

Yes. Qwen3.7-Max natively supports the Anthropic API protocol, so it works as a drop-in backbone for Anthropic-protocol harnesses like Claude Code and OpenClaw - point them at the DashScope Anthropic-compatible endpoint with your DashScope key. Confirm exact variable names and the endpoint in current docs, as they can change during preview.

Can I run it locally or download the weights?

No. The Qwen 3.7 generation is closed-weight and API-only - there are no weights on Hugging Face or GitHub. For self-hosting, use the open-weight Qwen 3.6 models (e.g. Qwen3.6-27B, Apache 2.0) instead.

What's the context window and max output?

A 1-million-token context window (up from 256K on Qwen3.6-Max) and up to a 64K-token maximum output per response.

Why does it sometimes refuse to answer factual questions?

By design. Qwen3.7-Max posts the lowest hallucination rate among frontier models partly by abstaining more often - its attempt rate on broad-knowledge benchmarks is low. Pair it with retrieval or use another model for pure factual recall.

OpenAI-compatible or Anthropic-compatible - which?

Whichever matches your stack. The OpenAI compatible-mode endpoint is simplest for most chat and tool-use code; the Anthropic-compatible endpoint exists chiefly so Anthropic-protocol harnesses (like Claude Code) work unchanged. Both serve the same qwen3.7-max model.