PC for Qwen 3.6: 27B dense or 35B-A3B MoE, which choice?

June 10, 2026

Qwen 3.6 is the most talked-about open source model right now. But before choosing a PC to run it, you must answer one question: the dense 27B version, or the MoE 35B-A3B version? Everything follows from that.

Both belong to the same family. Both fit on a consumer card in 4-bit. Both are under Apache 2.0 license. But they behave very differently in use.

This guide answers the question first, then tells you the PC suited to your choice.

Qwen 3.6: what you need to know

Qwen 3.6 is Alibaba’s latest generation of open models. It’s a multimodal model (text, image, video), with hybrid reasoning, and an exceptional native context window reaching up to 1 million tokens.

Two open variants count for local use:

Qwen3.6-27B

Dense — code quality

27 billion parameters, all active
Fits in about 16.8 GB in 4-bit
Stable and predictable behavior
Better on code and instruction following
Includes vision
Benefits from DFlash acceleration (recent NVIDIA)

Qwen3.6-35B-A3B

MoE — speed

35 billion total, 3 billion active per token
Computational cost close to a 3B model
Quality close to a 35B dense model
Very fast: over 100 tokens/s reported on high-end GPU
About 21 GB in 4-bit, requires 24 GB of VRAM
Ideal for agents and fast tool use

How to read “35B-A3B”: it’s a Mixture-of-Experts model. It contains 35 billion parameters in total, but a router activates only about 3 billion for each generated token. Result: the computational cost of a small model, with quality close to a large one. This allows reaching over 100 tokens per second on consumer hardware.

Dense or MoE: how to choose

You code a lot and tool-calling reliability matters above all

27B dense

You want maximum speed on a 24 GB card for chat and agents

35B-A3B

You have 16 GB of VRAM and no more

27B dense

You build long agent loops with many tool calls

27B dense

You prioritize perceived responsiveness in conversation

35B-A3B

An honest point about the MoE variant. Community feedback indicates that the 35B-A3B can, over long agent loops, repeat failed tool calls or skip them. The dense 27B variant is more consistent on these tasks. If you integrate Qwen 3.6 into an agent harness (MCP, OpenCode, etc.), test before committing. Also, the DFlash acceleration, which doubles speed, only works on the dense variant, not on the MoE.

VRAM: what you really need to plan for

The figures below apply to a short to medium context. Warning: with extended context, the KV cache significantly increases the required memory.

Variant	Quantization	VRAM (short context)	Recommended card
Qwen3.6-27B dense	Q4_K_M	about 16.8 GB	16 GB tight, 24 GB comfortable
Qwen3.6-27B dense	Q5_K_M	about 20 GB	24 GB
Qwen3.6-35B-A3B MoE	Q4_K_M	about 21 GB	24 GB
Qwen3.6-35B-A3B MoE	Q5_K_M	about 26 GB	32 GB
Either one, very long context	Q4 + quantized cache	+20 to 40 GB of KV cache	32 GB and up

The 16 GB trap. The 35B-A3B in 4-bit does not comfortably fit on 16 GB of VRAM, despite what is sometimes read. On a 16 GB card, run the dense 27B variant, designed for this capacity. To fully exploit the 35B-A3B, aim for 24 GB. For very long context or Q5, aim for 32 GB.

Note: with llama.cpp and a quantized KV cache (q8_0), the memory footprint of the context is almost halved. This allows fitting an extended context where the default configuration exceeds the budget. On our machines, these optimizations are preconfigured.

Launch Qwen 3.6 in two minutes

The easiest way is through Ollama. Choose the variant according to your card:

# Dense 27B variant (16 GB and up)
ollama run qwen3.6:27b

# MoE 35B-A3B variant (24 GB and up)
ollama run qwen3.6:35b-a3b

# For agent or code use, llama.cpp is often used
# with quantized KV cache for long context:
llama-server -m qwen3.6-35b-a3b-Q4_K_M.gguf \
  --cache-type-k q8_0 --cache-type-v q8_0 \
  --ctx-size 65536 --n-gpu-layers 99

Which PC for Qwen 3.6

The choice of machine directly depends on the targeted variant. Here are our suitable workstations, assembled in Auriol (13390) and delivered throughout the EU, with Ollama and Open WebUI preinstalled on request.

CoreAI 16 — RTX 5060 Ti 16 GBFor Qwen3.6-27B dense in Q4. The entry point. 1 703 €

CoreAI 64 — RTX 5090 32 GBThe reference: 35B-A3B in Q5, long context, over 100 tok/s. 6 042 €

NVIDIA GB10 Mini AI Server128 GB unified for the 1M token context without constraints. 3 999 €

Important for the RTX 5070 Ti and 16 GB cards. For the 35B-A3B specifically, 16 GB is not enough, even in Q4. If your goal is the MoE 35B-A3B, opt for a 24 or 32 GB card (RTX 5090). If you stay with 16 GB, the 27B dense is the right choice, and it is excellent. We advise you according to the targeted variant.

Why run Qwen 3.6 locally

Beyond privacy, Qwen 3.6 locally offers concrete advantages for those who want a serious AI permanently.

No recurring costs. No subscription, no token billing. Once the machine is purchased, usage is unlimited.
Private data. Your prompts, code, and documents never leave your network.
Top-tier quality. Qwen 3.6 rivals the best open models on code, reasoning, and agent tasks.
Massive context. Up to 1 million tokens natively, to handle entire codebases or long documents.
Apache 2.0 License. Free commercial use, no restrictions.

In brief

27B dense or 35B-A3B MoE?
Dense for code and reliable tool use on 16 GB. MoE for maximum speed on 24 GB and above.

What is the minimum VRAM?
16 GB for the 27B dense. 24 GB for the 35B-A3B. 32 GB for long context or Q5.

Is Qwen 3.6 free?
Yes, open source under Apache 2.0. You only pay for the hardware.

Can it be used for code and agents?
Yes, that's one of its strengths. For long agent loops, prefer the 27B dense, which is more consistent.

Do you need a powerful machine for the 1M token context?
Yes: the KV cache can add 20 to 40 GB. The GB10 mini-server with its 128 GB unified memory is the most comfortable in this regard.

Back to the blog