Host Qwen3 on Hugging Face Spaces with UI (Free + Fast Setup)

Host Qwen3 on Hugging Face Spaces with UI

Introduction: Go From Model to Live Demo in Minutes

Want to make your own Qwen3 chatbot or coding assistant available online—free?

Hugging Face Spaces lets you:

  • Deploy Qwen3 (7B, 14B) easily

  • Build a frontend with Gradio

  • Share a live web app with a public link

  • No server or GPU setup needed

In this tutorial, you’ll:

  • Set up a Qwen3 model backend

  • Create a Gradio chat interface

  • Launch it on Spaces (in one click!)


1. Requirements

✅ Hugging Face account (free)
✅ A Qwen model (we’ll use Qwen/Qwen1.5-7B-Chat)
✅ Basic Python + Git knowledge


2. Create Your Space

Go to:
https://huggingface.co/spaces

  • Click “Create new Space

  • Choose:

    • SDK: Gradio

    • Visibility: Public or Private

    • Name: e.g. qwen3-chat-ui


3. Space File Structure

Upload or clone these 3 files:

/ (root) ├── app.py ├── requirements.txt └── README.md

4. Sample app.py

python
import gradio as gr from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "Qwen/Qwen1.5-7B-Chat" tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", trust_remote_code=True) def chat(user_input, history=[]): prompt = ( "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n" "<|im_start|>user\n" + user_input + "<|im_end|>\n" "<|im_start|>assistant\n" ) input_ids = tokenizer(prompt, return_tensors="pt").input_ids output = model.generate(input_ids, max_new_tokens=256) response = tokenizer.decode(output[0], skip_special_tokens=True) return response.split("<|im_start|>assistant\n")[-1].strip() iface = gr.Interface(fn=chat, inputs="text", outputs="text", title="Qwen3 Chatbot") iface.launch()

5. requirements.txt

txt
transformers gradio accelerate

Hugging Face Spaces will auto-install these dependencies.


6. Optional README.md

markdown
# Qwen3 Chat UI This Space uses Qwen/Qwen1.5-7B-Chat with Gradio to create a lightweight chatbot. - Model: [Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) - UI: Built with [Gradio](https://gradio.app) Try asking it anything!

7. Deploy and Share

Once uploaded:

  • Click “Build” → Hugging Face will provision your Space

  • After ~2–5 min, your app is live

  • Share your link: https://huggingface.co/spaces/yourname/qwen3-chat-ui


Bonus: Add Chat History & Markdown

Want to improve the experience?

python
iface = gr.ChatInterface(chat, title="Qwen3 Chatbot", theme="soft")

You can also:

  • Add Markdown output

  • Use ChatInterface with state

  • Extend to Qwen Code / Tool use


Alternative: Use vLLM for Faster Backend

If latency is an issue, deploy Qwen3 on your own GPU or server with vLLM, then proxy Hugging Face UI to it.


Conclusion: Qwen3 Live in Minutes

You’ve now deployed a public Qwen3 chatbot:

  • No complex hosting

  • No API keys

  • Just Python + Hugging Face Spaces

From zero to open-source LLM demo in under 15 minutes.


Resources



Qwen3 Coder - Agentic Coding Adventure

Step into a new era of AI-powered development with Qwen3 Coder the world’s most agentic open-source coding model.