Qwen Pricing - Plans, API Costs, Free Access & Model Pricing

Qwen Pricing Overview

Qwen pricing in 2026 spans three distinct billing models, and which one you should use depends entirely on how you're planning to use the platform. The most common path is pay-as-you-go token billing through the DashScope API, where you pay only for the input and output tokens you consume across any Qwen model. This is what most developers reach for, and rates start as low as $0.033 per million input tokens for Qwen-Turbo and scale up to roughly $1.20 per million for Qwen3-Max.

For teams running consistent high-volume workloads, Alibaba also offers fixed monthly subscription plans through Model Studio most notably the Qwen Coding Plan, which starts at around $10/month for the Lite tier and $50/month for Pro, with included request quotas. These plans replace per-token billing with a predictable flat fee, useful for coding agents and tools where token usage is hard to forecast. For individual consumers, the Qwen Chat app itself is free at the basic tier, with premium features bundled via Alibaba Cloud account billing.

Finally, there's the free tier: every new Alibaba Cloud account on the international (Singapore) endpoint gets 1 million input tokens and 1 million output tokens free for 90 days, valid across most Qwen models. That's enough for serious prototyping before any money changes hands. Below we break down each of these in detail.

Qwen Subscription Plans

Unlike OpenAI or Anthropic, Qwen doesn't push a single consumer subscription. There are two distinct subscription routes: the free Qwen Chat web/mobile app for individual users, and the fixed-fee Qwen Coding Plan for developers who want predictable monthly billing instead of per-token costs.

Qwen Chat (Free)

per month, forever

Full access to Qwen models in web/mobile app
Image, video, document, audio inputs
Reasonable rate limits for personal use
No credit card required

Coding Plan Lite

~$10

per month

Access to Qwen3.5-Plus models
Fixed monthly subscription, no overage
Works with Claude Code, Qwen Code, Cursor
For individual developers and small teams

Coding Plan Pro

~$50

per month

Latest Qwen3.5-Plus and successor models
Up to 90,000 requests per month
Higher rate limits for agentic workflows
For full development teams and heavy users

The Coding Plan is specifically positioned for developers who use AI coding tools heavily and want a flat-fee budget rather than watching token meters spin. Because it's a fixed-fee subscription, even very high-volume usage stays within budget a major contrast to the pay-as-you-go API where one runaway agent can produce surprise bills.

For everyone else, the pay-as-you-go DashScope API is the recommended path because it's more flexible: you pay only for what you use, you can pick any model from the entire Qwen catalog (not just the Plus tier bundled with the Coding Plan), and there's a meaningful free quota to start with.

Qwen Monthly Pricing

The headline monthly numbers, for quick reference:

Qwen Chat (consumer): $0/month, all core features free.
Coding Plan Lite: ~$10/month covers most individual developers using AI in their editor daily.
Coding Plan Pro: ~$50/month for heavy users and teams, includes up to 90K requests/month.
API pay-as-you-go: No monthly minimum you pay only for tokens consumed. Typical monthly bills range from $0 (light testing within the free tier) to several thousand for production deployments.
Enterprise: Custom monthly contracts negotiated directly with Alibaba Cloud sales, including SLA guarantees, data residency commitments, and volume discounts.

If you're trying to estimate your own monthly Qwen bill on the API, the rough rule of thumb is: a moderately busy chatbot serving 5,000 customer support tickets per month with 5 turns each on Qwen2.5-72B works out to around $5–10/month on third-party providers and $15–30/month on DashScope direct. A full developer team using Qwen-Coder for daily completion plus refactoring lands closer to $50–150/month at typical usage patterns. These are order-of-magnitude estimates actual costs depend heavily on prompt length and chat history retention.

Qwen Yearly Pricing

Qwen doesn't publish a separate "annual subscription" SKU the way some SaaS products do, but there are two ways to lock in lower yearly costs:

Savings Plans Alibaba Cloud Model Studio offers prepaid savings plans where you commit to a certain dollar amount of API usage over a fixed term (typically 1 year) in exchange for a discounted per-token rate. The bigger the commitment, the deeper the discount, with typical savings of 15–30% off pay-as-you-go pricing for one-year commitments. These are most useful for teams with predictable steady-state usage who can confidently forecast their annual consumption.

Coding Plan annual billing the monthly Coding Plan subscriptions can typically be paid annually with a modest discount (around 10–20% off the monthly rate × 12). For a team running Pro consistently, paying annually saves roughly $100/year per seat.

For enterprises, annual contracts negotiated directly with Alibaba Cloud sales are where the deepest discounts live. These typically include custom volume tiers, dedicated capacity, SLA commitments, and data processing agreements. If you're forecasting more than $5,000/month in API spend, it's worth contacting sales rather than just running on the public rate card.

💡 Yearly savings plans are region-specific and don't always cover every model. Read the fine print on the Model Studio billing page before committing some savings plans only apply to the original Qwen-Turbo/Plus/Max tier and won't discount newer models like Qwen3-Max or Qwen-Flash.

Qwen Tokens Explained

The Qwen API bills in tokens, which is the standard unit for almost all LLM APIs today. Tokens are pieces of words roughly speaking, 1,000 tokens equals about 750 English words, or 500 Chinese characters. A short tweet is around 30 tokens; a one-page document is about 500 tokens; a moderately long blog post is 2,000–4,000 tokens.

You pay for tokens in two directions:

Input tokens everything you send to the model: your prompt, the system message, any chat history, attached documents, and image data (which is converted to tokens too). Input tokens are cheaper, typically by 3–4×.
Output tokens everything the model generates back. These are more expensive because they require sequential generation rather than parallel processing.

The biggest pricing trap for new Qwen users is the chat history compounding problem. To keep a chatbot conversational, every new message must re-send the entire conversation history as input tokens. By turn 10, you might be sending 5,000 input tokens just to get a short response. By turn 20, that doubles. For high-volume chatbots, your input bill grows much faster than your conversation does. Mitigations include summarizing old turns, dropping context aggressively, or enabling DashScope's prompt caching feature, which can reduce repeated-context costs by up to 90%.

Qwen Credits

Qwen uses the term "credits" loosely there isn't a separate Qwen credit currency the way some platforms do it. Instead:

Free tier "credits" means the 1M input + 1M output token quota that new accounts receive. These deplete as you make API calls and reset only via the 90-day expiration.
Account balance on DashScope is denominated in actual USD (or CNY for mainland accounts). You top up with a credit card or other payment method, and token usage is billed against this balance in real time.
Savings plan credits are prepaid usage allowances that consume against a discounted rate.

In practice, "do I have enough Qwen credits to run this job?" usually translates to "do I have enough USD balance in my Model Studio account?" The console shows your current balance and burn rate, and you can set spending alerts to avoid surprises.

Qwen 3 API Pricing

The Qwen 3 family is the current generation of Qwen API models, including Qwen3-Max (the flagship), Qwen3-Plus (the everyday workhorse), Qwen3-Coder for software engineering, and the Qwen3-Omni multimodal series. Pricing per million tokens on the DashScope international endpoint:

Model	Input ($/M)	Output ($/M)	Context	Best for
Qwen3-Max	$0.78 – $1.20	$3.90 – $6.00	262K	Frontier reasoning
Qwen3.6-Plus	$0.325	$1.95	1M	Coding, vibe coding
Qwen3.5-Plus	$0.30	$1.80	1M	General multimodal
Qwen3-Plus	$0.26	$0.78	1M	Default everyday
Qwen3-Coder-Next	~$0.30	~$1.50	128K	Agentic coding
Qwen-Flash	Tiered from $0.033	Tiered from $0.13	1M	High-volume cheap
Qwen3-VL-Plus	~$0.30	~$1.80	256K	Vision tasks
Qwen3-Omni	Tiered	Tiered	256K	Audio + voice + vision

Two things worth noting about Qwen 3 pricing. First, several models (Qwen3-Plus, Qwen3.5-Plus, Qwen-Flash) use tiered pricing based on input request size a 5K-token request and a 240K-token request fall into different brackets, not just proportionally different costs. Second, Qwen3-Max prices vary by provider: DashScope direct lists around $1.20/$6.00, OpenRouter shows $0.78/$3.90, Requesty shows $0.86/$3.44. The variation reflects different infrastructure providers in front of the same model third-party aggregators sometimes offer cheaper rates.

Qwen 2.5 API Pricing

The Qwen 2.5 family is the prior generation but still widely deployed because of its strong price/performance ratio. Qwen2.5-72B in particular became a cult favorite for being roughly 1/10th the price of GPT-4o while matching its math and coding scores. Current rates on DashScope and major third-party providers:

Model	Input ($/M)	Output ($/M)	Context
Qwen2.5-Max	$1.04	$4.16	32K
Qwen2.5-72B	~$0.23 (DeepInfra)	~$0.40	128K
Qwen2.5-32B	~$0.18	~$0.30	128K
Qwen2.5-Coder-32B	~$0.18	~$0.30	128K
Qwen2.5-Coder-7B	~$0.07	~$0.16	128K
Qwen-Turbo (2.5)	$0.05	$0.20	1M

Qwen 2.5 pricing varies more widely than Qwen 3 because the weights are openly available, which means dozens of third-party providers host them. DeepInfra is consistently among the cheapest, often 50–80% below DashScope on the same model. The trade-off is data residency: DashScope offers Alibaba's own infrastructure, regions, and compliance posture, while third-party hosts have their own.

For most workloads, Qwen 2.5 remains an excellent value choice the quality gap to Qwen 3 is real but smaller than the price gap on third-party providers. If you're price-sensitive and don't need the absolute latest model, Qwen2.5-72B on a cheap provider is hard to beat.

Qwen API Free Tier

The Qwen API free tier is a genuine free tier, not a free trial with hidden charges. Here's exactly what you get:

1 million input tokens across most Qwen models.
1 million output tokens across most Qwen models.
Valid for 90 days from the date you activate Model Studio.
Available only on the International (Singapore) endpoint. The US (Virginia) and Chinese Mainland regions don't include a free tier.
No credit card required to start, though you'll need to add one before the free quota expires if you want to continue using the API.

1M + 1M tokens is enough for serious prototyping you could run several thousand chatbot exchanges, generate hundreds of thousands of words of content, or process a hundred long documents before exhausting the quota. For learning the API or evaluating a model fit for your use case, you very rarely need to pay anything.

Beyond DashScope's official free tier, several third-party providers offer their own free credits or free tiers for Qwen access OpenRouter ($1 free credit on signup), Puter (unified account with free hobby usage across many models), and Together AI (free credits on signup) are the most common. These typically aren't as generous as DashScope's million-token quota but give you instant access without an Alibaba Cloud signup.

How to Get a Qwen API Key

Getting a Qwen API key is a 5-minute process. The steps:

Create an Alibaba Cloud account at alibabacloud.com. You'll need a valid email and a phone number for verification. Use the international site (not the China site) unless you specifically need mainland deployment.
Activate Model Studio from the Model Studio product page. This step automatically enables your 1M+1M free token quota.
Open the API Keys page in the Model Studio console sidebar (sometimes labeled "Key Management").
Click "Create API Key" and optionally add a description for tracking purposes.
Copy the key it starts with sk-. Store it in a password manager, .env file, or secrets manager. Never commit API keys to public repositories.
Set it as an environment variable for easier use:

# macOS / Linux
export DASHSCOPE_API_KEY="sk-your-key-here"

# Make it permanent (zsh/bash)
echo 'export DASHSCOPE_API_KEY="sk-your-key-here"' >> ~/.bashrc

# Windows PowerShell
$env:DASHSCOPE_API_KEY = "sk-your-key-here"

Then your first request from Python is just three lines of meaningful code:

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

response = client.chat.completions.create(
    model="qwen-plus",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

⚠️ Regional keys are not interchangeable. A Singapore key won't authenticate against the Beijing or US endpoints. If you get a 401 error, double-check that the base URL matches the region your key was issued in.

Cost Control Tips

A few practical ways to keep your Qwen bill predictable:

Use Qwen-Turbo or Qwen-Flash for routing and classification. Don't waste Qwen-Max tokens on tasks that don't need frontier reasoning.
Enable prompt caching for any workload with a long, repeating system prompt or knowledge base. It can reduce repeated-context costs by up to 90%.
Use batch invocation for non-real-time work it's priced at 50% of real-time rates for both input and output.
Trim chat history aggressively. Summarize old turns into one compact message rather than re-sending the full transcript.
Set spending alerts and hard limits in the Model Studio console so a runaway agent can't drain your account.
Compare third-party providers for Qwen 2.5 models. Open-weight models on DeepInfra, Together, or OpenRouter are often dramatically cheaper than DashScope for the same model.
Consider the Coding Plan for predictable budgeting. If your team uses AI coding tools heavily, the flat $50/month Pro plan caps your bill regardless of token usage.

FAQ

Is Qwen really free to start?

Yes. The Qwen Chat web app and mobile apps are completely free for personal use. The API includes a 1M input + 1M output token free quota for 90 days on the Singapore endpoint. No credit card is required to start prototyping.

Does Qwen have a ChatGPT Plus equivalent?

Not exactly. The Qwen Chat consumer app is free at the base tier without a paid Plus tier above it. The closest equivalents are the Qwen Coding Plan ($10/Lite, $50/Pro per month) which is targeted at developers using AI coding tools, not general consumers.

What's the difference between Qwen 3 and Qwen 2.5 API pricing?

Qwen 3 models are newer and generally priced 10–30% higher than equivalent Qwen 2.5 models on the DashScope direct API. The trade-off is better reasoning, longer context windows, and improved multilingual support. Qwen 2.5 models are also widely hosted by cheaper third-party providers (DeepInfra, Together, OpenRouter), which can make the effective price gap larger.

How are Qwen tokens calculated?

Roughly 1,000 tokens equals 750 English words or 500 Chinese characters. For images on Qwen-VL, each image consumes a variable number of tokens depending on resolution (typically 256–1500 per image). The DashScope console shows exact token counts per request, and the OpenAI SDK returns usage data in the response.

What happens when my free quota runs out?

If you've added a payment method, billing seamlessly switches to pay-as-you-go at the published per-token rates. If you haven't added a payment method, API calls will fail with a quota-exceeded error until you either add billing or wait for the next billing cycle (though the free quota itself doesn't reset it's a one-time 90-day allocation).

Can I get a refund on the Coding Plan?

Subscription refund policies follow Alibaba Cloud's standard terms typically prorated refunds available within the first 7 days of a new subscription, with longer-term refunds at Alibaba's discretion. For specifics, check the Model Studio terms of service or contact billing support.

Are there volume discounts?

Yes. Three main paths: savings plans for prepaid 1-year commitments at 15–30% off, batch invocation at 50% off for non-real-time workloads, and custom enterprise contracts for very high-volume usage, negotiated directly with Alibaba Cloud sales. If you're forecasting more than $5,000/month, enterprise sales is worth contacting.

Why do prices differ between DashScope, OpenRouter, and DeepInfra?

For closed-weight models like Qwen-Max, only DashScope hosts the official version, so it sets the price. For open-weight models like Qwen2.5-72B and Qwen3-Max, dozens of providers host the same weights and compete on price. Third-party providers often offer 30–80% discounts on open-weight models, while DashScope provides the official Alibaba infrastructure with specific compliance and SLA characteristics.

Is my data used to train Qwen models?

According to the Model Studio Terms of Service, paid API requests are not used for model training by default. The free consumer Qwen Chat app may use anonymized interactions for service improvement. For enterprise contracts, no-training data clauses are standard and explicit.

How do I cancel a Qwen subscription?

From the Model Studio console, go to Billing → Subscriptions and cancel the active plan. Your subscription remains active through the end of the current billing period. API pay-as-you-go usage has no subscription to cancel you simply stop making API calls and stop being billed.

Final Thoughts

Qwen pricing in 2026 is genuinely competitive for many workloads, dramatically cheaper than the equivalent OpenAI, Anthropic, or Google offerings and the structure gives you real flexibility. If you're a casual user, the free Qwen Chat app covers everything. If you're a developer prototyping, the free API tier gives you a million tokens before you pay a cent. If you're running production at scale, pay-as-you-go pricing with optional savings plans, batch discounts, and prompt caching keeps the bill manageable. And if you want predictability over flexibility, the fixed-fee Coding Plan caps your monthly spend regardless of usage.

The easiest way to find your real cost is to just start. Sign up at Model Studio, activate your free quota, run your actual workload for a week, and look at the actual bill. Estimates only get you so far the only reliable cost model is the one based on your own usage patterns.

Qwen Pricing: Plans & API Costs

Qwen Pricing Overview

Qwen Subscription Plans

Qwen Chat (Free)

Coding Plan Lite

Coding Plan Pro

Qwen Monthly Pricing

Qwen Yearly Pricing

Qwen Tokens Explained

Qwen Credits

Qwen 3 API Pricing

Qwen 2.5 API Pricing

Qwen API Free Tier

How to Get a Qwen API Key

Cost Control Tips

FAQ

Is Qwen really free to start?

Does Qwen have a ChatGPT Plus equivalent?

What's the difference between Qwen 3 and Qwen 2.5 API pricing?

How are Qwen tokens calculated?

What happens when my free quota runs out?

Can I get a refund on the Coding Plan?

Are there volume discounts?

Why do prices differ between DashScope, OpenRouter, and DeepInfra?

Is my data used to train Qwen models?

How do I cancel a Qwen subscription?

Final Thoughts