PC for Qwen 3.6: 27B dense or 35B-A3B MoE, which to choose?

Qwen 3.6 is the most discussed open-source model right now. But before choosing a PC to run it, you need to answer one question: the dense 27B version, or the MoE 35B-A3B version? Everything else follows from there.

Both belong to the same family. Both fit on a consumer graphics card in 4-bit. Both are licensed under Apache 2.0. But they behave very differently in use.

This guide will first settle that question, then recommend the right PC for your choice.


Qwen 3.6: What you need to know

Qwen 3.6 is the latest generation of Alibaba's open models. It's a multimodal model (text, image, video), with hybrid reasoning, and an exceptional native context window of up to 1 million tokens.

Two open variants are relevant for local use:

Qwen3.6-27B

Dense — code quality

  • 27 billion parameters, all active
  • Fits in approximately 16.8 GB in 4-bit
  • Stable and predictable behavior
  • Better at code and instruction following
  • Includes vision
  • Benefits from DFlash acceleration (recent NVIDIA)

Qwen3.6-35B-A3B

MoE — speed

  • 35 billion total, 3 billion active per token
  • Computational cost similar to a 3B model
  • Quality similar to a 35B dense model
  • Very fast: over 100 tokens/s reported on high-end GPUs
  • Approximately 21 GB in 4-bit, requires 24 GB of VRAM
  • Ideal for agents and fast tool-use
How to read "35B-A3B": This is a Mixture-of-Experts model. It contains 35 billion parameters in total, but a router only activates about 3 billion of them for each generated token. The result is a computational cost of a small model, with quality similar to a large one. This is what allows it to achieve over 100 tokens per second on consumer hardware.


Dense or MoE: How to choose

You code a lot and tool-calling reliability is paramount
27B dense
You want maximum speed on a 24 GB card for chat and agents
35B-A3B
You have 16 GB of VRAM and no more
27B dense
You set up long agent loops with many tool calls
27B dense
You prioritize perceived responsiveness in conversation
35B-A3B
An honest point about the MoE variant. Community feedback indicates that 35B-A3B can, in long agent loops, repeat failed tool calls or skip them. The 27B dense variant is more consistent on these tasks. If you're wiring Qwen 3.6 into an agent harness (MCP, OpenCode, etc.), test before committing. Furthermore, DFlash acceleration, which doubles the speed, only works on the dense variant, not the MoE.


VRAM: What you really need to plan for

The figures below are for short to medium context. Beware: with extended context, the KV cache significantly inflates the memory required.

Variant Quantization VRAM (short context) Recommended card
Qwen3.6-27B dense Q4_K_M approx. 16.8 GB 16 GB tight, 24 GB comfortable
Qwen3.6-27B dense Q5_K_M approx. 20 GB 24 GB
Qwen3.6-35B-A3B MoE Q4_K_M approx. 21 GB 24 GB
Qwen3.6-35B-A3B MoE Q5_K_M approx. 26 GB 32 GB
Either, very long context Q4 + quantized cache +20 to 40 GB KV cache 32 GB and more
The 16 GB trap. The 35B-A3B in 4-bit does not comfortably fit on 16 GB of VRAM, despite what is sometimes read. On a 16 GB card, run the 27B dense variant, designed for this envelope. To fully exploit the 35B-A3B, aim for 24 GB. For very long context or Q5, aim for 32 GB.
Good to know: with llama.cpp and a quantized KV cache (q8_0), the memory footprint of the context is almost halved. This allows for accommodating an extended context where the default configuration would exceed the budget. On our machines, these optimizations are preconfigured.


Run Qwen 3.6 in two minutes

The simplest way is via Ollama. Choose the variant according to your card:

# Dense 27B variant (16 GB and up)
ollama run qwen3.6:27b

# MoE 35B-A3B variant (24 GB and up)
ollama run qwen3.6:35b-a3b

# For agent or code use, llama.cpp is often used
# with quantized KV cache for long context:
llama-server -m qwen3.6-35b-a3b-Q4_K_M.gguf \
  --cache-type-k q8_0 --cache-type-v q8_0 \
  --ctx-size 65536 --n-gpu-layers 99


Which PC for Qwen 3.6

The choice of machine directly follows from the desired variant. Here are our adapted workstations, assembled in Auriol (13390) and delivered throughout the EU, with Ollama and Open WebUI pre-installed on request.

Radiance CoreAI 16 CoreAI 16 — RTX 5060 Ti 16 GBFor Qwen3.6-27B dense in Q4. The entry point. €1,703 Radiance CoreAI 64 RTX 5090 CoreAI 64 — RTX 5090 32 GBThe benchmark: 35B-A3B in Q5, long context, over 100 tok/s. €6,042 ASUS Ascent GX10 GB10 NVIDIA GB10 AI Mini Server128 GB unified for 1M token context without constraint. €3,999
Important note on RTX 5070 Ti and 16 GB cards. For 35B-A3B specifically, 16 GB is not enough, despite what you sometimes read. If your goal is the 35B-A3B MoE, aim for a 24 or 32 GB card (RTX 5090). If you stick with 16 GB, the 27B dense is the right choice, and it is excellent. We advise you according to the target variant.


Why run Qwen 3.6 locally

Beyond privacy, Qwen 3.6 locally offers concrete advantages for those who want serious AI at home.

  • No recurring cost. No subscription, no per-token billing. Once the machine is acquired, usage is unlimited.
  • Private data. Your prompts, your code, your documents never leave your network.
  • Top-tier quality. Qwen 3.6 competes with the best open models in code, reasoning, and agent tasks.
  • Massive context. Up to 1 million tokens natively, to process entire codebases or long documents.
  • Apache 2.0 license. Free commercial use, without restrictions.


In brief

27B dense or 35B-A3B MoE?
Dense for reliable code and tool-use on 16 GB. MoE for maximum speed on 24 GB and up.

What is the minimum VRAM?
16 GB for the 27B dense. 24 GB for the 35B-A3B. 32 GB for long context or Q5.

Is Qwen 3.6 free?
Yes, open source under Apache 2.0. You only pay for the hardware.

Can it be used for code and agents?
Yes, that's one of its strengths. For long agent loops, prefer the 27B dense, which is more consistent.

Do you need a big machine for 1M token context?
Yes: the KV cache can add 20 to 40 GB. The GB10 mini-server with its 128 GB unified memory is best suited for this.

Back to blog

Your quote for a custom AI solution within 24–48 hours

Every Radiance project begins with a conversation. Fill out this form and an expert will get back to you shortly with a solution tailored to your business and budget.

Response within 24–48 business hours
Delivery throughout Europe (EU)
2-year warranty included
On-site installation available
No commitment on demand
Dedicated support before and after purchase
01 What is your primary use for AI?
Multiple choice.
02 In what context will the system be used?
Single choice.
03 What type of system are you looking for?
Single choice.
04 Which operating system do you prefer?
Single choice.
05 What are your expectations for the software?
Multiple choice.
06 What is your indicative budget?
Single choice.
07 When would you like to receive your system?
Single choice.
08 Would you like help with implementation?
One choice. A Radiance technician can assist you at your home or remotely.
09 Country of delivery (EU only) *
We only deliver within the European Union (EU).
10 Additional information (optional but very useful)
Briefly describe your project, any specific constraints, or any other relevant information.
11 Would you like to be contacted to discuss your project?
If you choose "Quote only", you can reply to our email to ask your questions and refine the quote.
12 Email *
We will send you the quote to this address.

More questions?

Send us an email at contact@radiancesystems.eu or contact us via the contact form. We respond to all inquiries within 3 hours during business hours (Monday to Friday, 9am to 5pm).

📞 +33 4 65 84 48 21