What is the Qwen API?
The Qwen API is Alibaba Cloud's hosted endpoint for the entire Qwen model family the flagship Qwen-Max, the cost efficient Qwen-Plus and Qwen-Flash, the math-specialized Qwen-Math, the code-focused Qwen-Coder, and the multimodal Qwen-VL and Qwen-Omni lines. Rather than running models on your own GPUs, you call them as a service over HTTP, paying only for the tokens you actually use. The API is served through Alibaba's DashScope infrastructure (also called Alibaba Cloud Model Studio in the documentation), and exposes both an OpenAI-compatible interface and a native DashScope SDK.
What makes the Qwen API attractive to developers in 2026 is the combination of frontier-class model quality, aggressive pricing — often an order of magnitude cheaper than GPT-4-class endpoints — and OpenAI compatibility, which means you can migrate existing code with nothing more than a base URL swap. For most teams, "use Qwen via DashScope" is the cheapest way to get production-grade LLM capability with reasonable latency and no infrastructure to manage.
Qwen API Platform
The Qwen API is delivered through Alibaba Cloud Model Studio, which is the unified developer platform for everything Qwen-related. Model Studio handles authentication, billing, model selection, rate limiting, and observability. Under the hood, the actual API surface is called DashScope — you'll see both names in the documentation, and they refer to the same service.
Model Studio is deployed in four regions, each with a separate endpoint and separate API keys. Keys are not interchangeable across regions, so pick the region closest to your users (or whichever one your compliance team approves) and stick with it. The four regions are:
- Singapore (International):
https://dashscope-intl.aliyuncs.com/compatible-mode/v1— the default for non-China teams. - US (Virginia):
https://dashscope-us.aliyuncs.com/compatible-mode/v1— lowest latency for US-based teams. - China (Beijing):
https://dashscope.aliyuncs.com/compatible-mode/v1— for mainland China deployments. - Hong Kong (China):
https://cn-hongkong.dashscope.aliyuncs.com/compatible-mode/v1
The platform also offers a web console for testing prompts in a playground, viewing usage metrics, managing API keys, configuring sub-workspaces with role-based access, and accessing the model catalog. For most developers, the relationship is simple: sign up at Model Studio, get a key, then write code against the OpenAI-compatible endpoint just like you would with OpenAI itself.
Qwen API Pricing
Qwen API pricing is one of the strongest competitive advantages of the platform. Input costs across the full Qwen catalog range from $0.01 to $1.04 per million tokens, depending on the model. Output costs are higher (typically 3–4× the input rate) but still dramatically cheaper than GPT-4-class models. Here's a current snapshot of the most popular models on the International (Singapore) endpoint:
| Model | Input ($/M tokens) | Output ($/M tokens) | Context |
|---|---|---|---|
| Qwen-Max | $1.04 | $4.16 | 262K |
| Qwen3-Max | ~$0.78 | ~$2.34 | 262K |
| Qwen-Plus | $0.26 | $0.78 | 1M |
| Qwen3.6-Plus | $0.325 | $1.95 | 1M |
| Qwen-Turbo | $0.05 | $0.20 | 1M |
| Qwen-Flash | Tiered (from $0.033) | Tiered (from $0.13) | 1M |
| Qwen3-Coder-Next | ~$0.30 | ~$1.50 | 128K |
Two important nuances about how DashScope bills you. First, pricing is tiered based on the input size of each individual request for several models (Qwen-Plus, Qwen3.5-Plus, Qwen-Flash). A short 5K-token request and a maxed-out 240K-token request don't just cost different amounts in proportion — they fall into entirely different rate brackets. The structure rewards keeping requests short, which can occasionally conflict with the reason you'd reach for a 1M-context model in the first place.
Second, batch invocation gets a 50% discount. If your workload is non-real-time — overnight document processing, bulk enrichment, dataset labeling — use the batch API and you'll cut your bill in half on both input and output tokens. The trade-off is that batch jobs return results asynchronously rather than in milliseconds, so it doesn't fit conversational use cases.
💡 Prices change frequently. Always verify the current rate on the official DashScope pricing page before committing to a production workload.
Qwen API Key Free Tier
Yes — there is a meaningful free tier. New Alibaba Cloud Model Studio accounts in the International (Singapore) region receive 1 million input tokens and 1 million output tokens free, valid for 90 days after activating Model Studio. That's enough for most developers to thoroughly prototype an application, run benchmarks across multiple models, and prove out the architecture before they ever pay a cent.
A few things to know about the free quota:
- It only applies to the International (Singapore) endpoint. The US (Virginia) Global deployment mode and the Chinese Mainland deployment mode have no free quota.
- The 1M + 1M is a combined cap across most Qwen models — not per-model. Use it however you like across Qwen-Max, Plus, Turbo, Coder, VL, etc.
- It expires after 90 days, even if unused. Don't sit on it.
- Beyond the free quota, billing kicks in automatically using whichever payment method you've added. There's no surprise overage charge — you set up billing during signup.
For very casual experimentation or hobby projects, third-party aggregators like OpenRouter or Puter also expose Qwen with their own free tiers and unified billing. These are slightly more expensive per token than DashScope direct, but the signup is faster and you get access to other model providers from the same key.
How to Get a Qwen API Key
Getting a Qwen API key takes about five minutes. Here's the exact sequence:
- Create an Alibaba Cloud account. Go to alibabacloud.com and click Sign Up. You'll need a valid email and a phone number for verification. The international site is what you want — don't sign up on the China site unless you're explicitly targeting mainland deployment.
- Activate Model Studio. Go to the Model Studio product page and click Activate. Accept the Terms of Service. This step also enables your free quota (1M input + 1M output tokens).
- Open the API Keys page. From the Model Studio console, find the sidebar item labeled API Keys (sometimes shown as Key Management).
- Click "Create API Key". You can optionally add a description for tracking which key belongs to which application.
- Copy the key immediately. The key starts with
sk-. Store it somewhere safe — a password manager, a.envfile, or your platform's secrets manager. Never commit it to a public Git repository. - Set it as an environment variable so you don't have to hardcode it in your code:
# macOS / Linux — temporary (current session only)
export DASHSCOPE_API_KEY="sk-your-key-here"
# Permanent — add to ~/.bashrc or ~/.zshrc
echo 'export DASHSCOPE_API_KEY="sk-your-key-here"' >> ~/.bashrc
# Windows PowerShell
$env:DASHSCOPE_API_KEY = "sk-your-key-here"
If you ever lose the key, you can go back to the API Keys page and view it again, or revoke it and create a new one. Each Model Studio account can have multiple keys, which is useful for separating production from staging or for assigning different keys to different applications.
⚠️ Regional keys are not interchangeable. A key created on the Singapore endpoint will fail authentication on the Beijing or US endpoint, and vice versa. If you get a 401 error, the wrong base URL is more likely than a bad key.
How to Use the Qwen API
The Qwen API supports two interfaces. The OpenAI-compatible interface is what almost everyone should use — it's a drop-in replacement for the OpenAI SDK, works with the standard openai Python package, LangChain, LiteLLM, and any other tool that speaks OpenAI's protocol. The native DashScope SDK exposes a few advanced features (batch invocation, certain multimodal options, real-time speech) that aren't in the OpenAI shape, but for standard chat and tool calling, the compatible interface covers 95% of use cases.
The migration from OpenAI is minimal. You only change three things:
- The base URL — point at
https://dashscope-intl.aliyuncs.com/compatible-mode/v1instead of OpenAI's. - The API key — use your DashScope
sk-key. - The model name — use
qwen-plus,qwen-max,qwen-turbo, etc.
Once those are configured, every standard feature works: streaming, function calling, JSON mode, system prompts, multi-turn conversations, image inputs (for VL models), and audio inputs (for Omni models).
Qwen API Python Examples
Step 1 — Install the SDK
pip install openai
Step 2 — Your first request
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen-plus",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who are you?"}
]
)
print(completion.choices[0].message.content)
Step 3 — Streaming responses
For chatbot-style apps you'll want to stream tokens as they're generated:
stream = client.chat.completions.create(
model="qwen-plus",
messages=[{"role": "user", "content": "Write a haiku about coding."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Step 4 — Function calling
Qwen supports OpenAI's standard tool-calling format:
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"},
},
"required": ["city"]
}
}
}]
response = client.chat.completions.create(
model="qwen-plus",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools,
)
print(response.choices[0].message.tool_calls)
Step 5 — Multimodal (Qwen-VL)
For vision tasks, swap the model name and add an image input:
response = client.chat.completions.create(
model="qwen-vl-plus",
messages=[{
"role": "user",
"content": [
{"type": "image_url",
"image_url": {"url": "https://example.com/chart.png"}},
{"type": "text", "text": "Extract all data from this chart as JSON."}
]
}]
)
print(response.choices[0].message.content)
Step 6 — Using the native DashScope SDK
If you need DashScope-specific features (batch, advanced multimodal, real-time speech), install the native SDK instead:
pip install dashscope
import os
import dashscope
from dashscope import Generation
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
response = Generation.call(
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen-plus",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who are you?"}
],
result_format="message",
)
if response.status_code == 200:
print(response.output.choices[0].message.content)
else:
print(f"Error: {response.code} — {response.message}")
Qwen API Documentation
The official documentation is hosted on the Alibaba Cloud help center. The most useful entry points:
- Main documentation home: alibabacloud.com/help/en/model-studio — covers everything from getting started to advanced features.
- OpenAI-compatible reference: OpenAI compatibility guide — endpoints, request shapes, supported parameters, migration notes.
- DashScope API reference: DashScope native API reference — request/response parameters, code examples in Python and Java.
- Pricing page: Model Studio model pricing — current per-token rates for every model.
- Model catalog: Getting started → Models — every available model with capabilities, limits, and pricing tier.
- Error code reference: a separate page lists every 4xx/5xx code with explanations and remediation steps. Bookmark it.
- Qwen API Platform landing: qwen.ai/apiplatform — the official marketing entry point with quick links into the docs.
The docs are bilingual (English and Chinese) and reasonably complete, though some pages occasionally lag behind the latest API changes. If you hit something ambiguous, the QwenLM GitHub organization hosts working code examples for every major model, which is often clearer than the formal documentation.
FAQ
Is the Qwen API really free to start?
Yes — new accounts on the Singapore (International) region receive 1 million input tokens plus 1 million output tokens free, valid for 90 days. That's enough for serious prototyping across multiple models. The US (Virginia) and Chinese Mainland regions don't include a free quota.
Which Qwen API model should I start with?
Qwen-Plus is the best default — strong general intelligence, 1M context, ~$0.26 per million input tokens. Use Qwen-Turbo or Qwen-Flash for high-volume cheap workloads, Qwen-Max for reasoning-heavy tasks where you need frontier quality, Qwen-Coder for code, Qwen-VL for vision, and Qwen-Omni for audio.
Is the Qwen API really OpenAI-compatible?
Yes. The Qwen API exposes an OpenAI-compatible endpoint that works with the official openai Python SDK, the OpenAI JS SDK, LangChain, LlamaIndex, LiteLLM, and any other tool that speaks the OpenAI protocol. You change three things: the base URL, the API key, and the model name. Everything else — streaming, function calling, JSON mode — works identically.
What are the rate limits?
DashScope applies dual limits: RPM (requests per minute) and RPS (requests per second). Free-tier accounts typically see 60–600 RPM depending on tier. You can request quota increases through the Model Studio console once you have a billing history. Hitting either limit returns a 429 — implement exponential backoff in production.
Why does my API key return 401 errors?
Almost always one of three things: (1) wrong base URL for your key's region (Singapore keys don't work on Beijing endpoints), (2) the environment variable isn't actually loaded in your shell, or (3) the key is from a sub-workspace that hasn't been granted model permissions. Check the URL first.
Can I get bulk discounts?
Yes. Use the batch invocation API for any non-real-time workload — overnight processing, dataset labeling, document enrichment — and you'll pay 50% of the real-time rate on both input and output tokens. Enterprise contracts can include further volume discounts negotiated directly with Alibaba Cloud sales.
Does the Qwen API support image and audio inputs?
Yes, through the Qwen-VL (vision) and Qwen-Omni (audio + video) model families. Both work through the same OpenAI-compatible endpoint, you just change the model name to qwen-vl-plus, qwen-vl-max, or qwen3-omni-30b-a3b-instruct and add image or audio content blocks to your messages.
Is my data used to train Qwen models?
According to the Model Studio Terms of Service, API requests on the standard paid tiers are not used for model training by default. For sensitive use cases, request a Data Processing Agreement (DPA) through Alibaba Cloud sales. Enterprise contracts can include explicit no-training guarantees and data residency commitments.
Can I use the Qwen API from outside Alibaba Cloud?
Yes — the API has no requirement that you also use other Alibaba Cloud services. You can call it from AWS, GCP, Azure, your laptop, anywhere. The only thing that matters is which regional endpoint you hit and whether your network can reach it.
What's the difference between DashScope and Model Studio?
Practically nothing for end users. Model Studio is the marketing name for the developer platform; DashScope is the technical name for the API layer underneath. Documentation uses both names interchangeably, and your single API key works for both.
Final Thoughts
The Qwen API has become one of the strongest options for production LLM deployment in 2026. The combination of aggressive pricing, a free tier large enough to actually prototype with, OpenAI compatibility that eliminates migration friction, and access to a full family of models — from frontier reasoning to budget high-volume to specialized math, code, vision, and audio — makes it a serious challenger to OpenAI, Anthropic, and Google for most workloads.
If you're starting fresh, the path is straightforward: sign up at Model Studio, activate your free quota, create an API key, install the openai Python SDK, point the base URL at the Singapore endpoint, and run your first qwen-plus request. Within ten minutes you'll have a working integration that you can compare directly against whatever you're using now.