The Open-Weights Models You Can Actually Download

I'm starting a 30-day deep dive into open-weights models.

Goal is to learn all the technical details of what it takes to get access to a model, host it in a server, measure the performance, fine tune them, monitor and solve real customer problems using these llms.

Before I run any of these locally or pick favorites, I want a map of the territory: who's shipping, what they're shipping, how big it is, and what license it ships under. This post is that map.

It's a versioned post. I'll refresh it as new models drop, and you'll see previous snapshots stack up at /posts/open-weights-models/versions.

The customer scenario I'm anchoring on

For the next 30 days, I am filtering every decision in this series through one customer scenario: contract review for a mid-size law firm. Pulling obligations and risks out of long contracts. Flagging unusual clauses against a playbook. Drafting a first-pass review memo that a partner can sign off on after edits. The scenario is made up, not a real customer, but the constraints are real.

This choice of legal contracts review enables me to focus on a list of factors that i want to explore in detail.

Long context, because contracts run into the hundreds of pages and a deal often comes with a stack of related documents.
Strict instruction-following, because legal language needs the exact words, not a paraphrase.
Low tolerance for hallucination, because a confident wrong summary of an obligation is worse than no summary at all. And a strong preference for on-prem deployment, because the documents are confidential and most firms will not let them go through a third-party API.

It also rules out some of the things that i want to defer exploring to a later stage. I don't need a coding model. I don't need 5M-token context windows. 256K is enough for almost any contract bundle. Multimodality is nice but not needed for now.

The same map below will look different when the filter is "fix Python bugs" or "summarize a doctor's note." But for a legal-tech reader, this is the lens. Which models survive the filter is a question for next week's post. This one is just the territory.

What "open weights" actually means

A model is open weights if you can download the parameter file and run it on your own machine. This is a smaller claim than "open source." Open source usually means code plus weights plus training data plus recipe, all under a permissive license. Most of the models below ship just the weights.

There is also another distinction that is less obvious. Some weights are downloadable, but only after you accept a custom license, agree not to use the model in certain regions, or wait for human approval on Hugging Face. This is open with friction. The post points out which is which.

The landscape in one paragraph

Chinese labs are moving fastest. DeepSeek, Moonshot, Zhipu, Alibaba (Qwen), MiniMax, and Tencent are all shipping frontier-scale checkpoints under or . Western labs are slower and more careful. Mistral and IBM are fully open. NVIDIA uses its own permissive license. OpenAI's gpt-oss release from August 2025 is still its only open contribution. Meta keeps Llama behind a community license with gated access. On the architecture side, MoE is now standard at scale, dense models are still common at small sizes, and hybrid attention is the new default (Mamba mixed with transformer, or compressed and sparse attention variants). 1M-token context is now common at the top of the lineup.

Chinese providers

DeepSeek

DeepSeek has been the fastest to ship new versions on the open side. V4 Pro, released April 24 2026, is a 1.6T-parameter MoE that activates 49B per token. V4 Flash is the smaller version. Both ship under MIT and download without gating.

Model	Total / Active	Architecture	Context	License	Open?
V4 Pro	1.6T / 49B	MoE, hybrid CSA + HCA attention	1M	MIT	Yes
V4 Flash	284B / 13B	MoE, hybrid CSA + HCA attention	1M	MIT	Yes

Models on huggingface.co/deepseek-ai.

Moonshot

Kimi K2.6 is the latest from Moonshot, released April 21 2026. The pitch is long-horizon coding agents: K2.6 can run agent swarms with hundreds of sub-agents and thousands of coordinated steps.

Model	Total / Active	Architecture	Context	License	Open?
Kimi K2.6	1T / 32B	MoE (384 experts), MLA	256K	Modified MIT	Yes

Model on huggingface.co/moonshotai/Kimi-K2.6.

Zhipu (Z.ai)

Heads up on the version numbers. GLM-4.7 doesn't exist. The GLM line went 4.6 (Sep 2025) to 5 (Feb 2026) to 5.1 (Apr 2026). GLM-5 was the first frontier-scale model trained end to end on Huawei Ascend hardware.

Model	Total / Active	Architecture	Context	License	Open?
GLM-5	744B / ~40B	MoE, DeepSeek Sparse Attention	128K	MIT	Yes
GLM-5.1	744B-class MoE	MoE	128K	MIT	Yes

Models on huggingface.co/zai-org.

Alibaba (Qwen)

The current Qwen flagship is Qwen3.5-397B-A17B (Feb 2026), with Qwen3.6 mid-tier variants shipping in April 2026. There is no Qwen 4 yet. Qwen3-Coder is the dedicated coding line, with a "Next" variant joining the older 480B model.

Model	Total / Active	Architecture	Context	License	Open?
Qwen3.5-397B-A17B	403B / 17B	MoE + Gated DeltaNet	262K (1M extensible)	Apache 2.0	Yes
Qwen3.6-35B-A3B	36B / 3B	MoE + DeltaNet	256K	Apache 2.0	Yes
Qwen3-Coder-Next	80B / 3B	MoE	256K	Apache 2.0	Yes
Qwen3-Coder-480B	480B / 35B	MoE	256K	Apache 2.0	Yes

Models on huggingface.co/Qwen.

MiniMax

MiniMax shipped M2.5 in February 2026 and M2.7 in March 2026. M2.7 ships under a custom MiniMax license, not Apache or MIT. Read the LICENSE file before using it commercially.

Model	Total / Active	Architecture	Context	License	Open?
M2.7	229B / ~10B	MoE	(sparse on card)	Custom	Yes (custom terms)
M2.5	230B / 10B	MoE, MLA-style long context	Long-context	Custom	Yes (custom terms)

Models on huggingface.co/MiniMaxAI.

Tencent

Tencent rebranded its open release line in 2026. Hunyuan 2.0 (Dec 2025) was the previous flagship. Hy3-preview (April 23 2026) is the current one. The license is not Apache or MIT. Tencent's Hy Community License has commercial-use clauses you should read before deploying.

Model	Total / Active	Architecture	Context	License	Open?
Hy3-preview	295B / 21B	MoE (192 experts)	256K	Tencent Hy Community	Yes (with terms)

Model on huggingface.co/tencent/Hy3-preview.

American providers

Model	Total / Active	Architecture	Context	License	Open?
Llama 5	~600B+	MoE	Up to 5M	Llama 5 Community	Gated
Llama 4 Maverick	~400B / 17B	MoE (128 experts)	1M	Llama 4 Community	Gated
Llama 4 Scout	~109B / 17B	MoE (16 experts)	10M	Llama 4 Community	Gated

IBM

Granite 4.0-H-Small (Oct 2025) is the most interesting one. A hybrid Mamba-2 plus transformer MoE that gets about 70% memory reduction on long-context workloads. IBM's whole pitch is on-prem and edge.

Model	Total / Active	Architecture	Context	License	Open?
Granite 4.0-H-Small	32B / 9B	Hybrid Mamba-2 + Transformer MoE	128K	Apache 2.0	Yes

Models on huggingface.co/ibm-granite.

Google

Gemma 4 (April 2 2026) is the current Gemma generation, with E2B/E4B variants for mobile and a 31B dense flagship. Gemma 3n is still maintained for on-device use cases (Pi-class hardware).

Model	Total / Active	Architecture	Context	License	Open?
Gemma 4 (31B)	31B	Dense	256K	Apache 2.0 (verify)	Yes
Gemma 4 (26B MoE)	26B	MoE	256K	Apache 2.0 (verify)	Yes
Gemma 3n E4B	8B / 4B effective	MatFormer, Per-Layer Embeddings	32K	Gemma terms	Yes

Models on huggingface.co/google.

NVIDIA

Nemotron 3 Super (GA March 11 2026) is a hybrid Mamba-Transformer MoE with native NVFP4. Nemotron 3 Ultra has a base checkpoint open but no instruction-tuned variant yet. Both ship under NVIDIA's own permissive Open Model License.

Model	Total / Active	Architecture	Context	License	Open?
Nemotron 3 Super	120.6B / 12.7B	Hybrid Mamba-Transformer MoE, NVFP4	1M	NVIDIA OML	Yes
Nemotron 3 Ultra (base)	550B / ~55B	Hybrid Mamba-Transformer MoE	1M	NVIDIA OML	Base only

Models on research.nvidia.com/labs/nemotron.

OpenAI

OpenAI's only open-weights release is gpt-oss from August 2025. There has been no refresh since. The 120B fits on a single H100 (80GB), the 20B fits on 16GB consumer hardware.

Model	Total / Active	Architecture	Context	License	Open?
gpt-oss-120B	117B / 5.1B	MoE (128 experts)	128K	Apache 2.0	Yes
gpt-oss-20B	21B / 3.6B	MoE (32 experts)	128K	Apache 2.0	Yes

Models on huggingface.co/openai.

European providers

Mistral

Mistral Large 3 (Dec 2025) is Mistral's first MoE since Mixtral, and the only frontier-scale open MoE in the list outside the Chinese labs. Mistral Small 4 (Mar 2026) combines Magistral, Pixtral, and Devstral into one line, with both a 24B dense and a 119B MoE variant.

Model	Total / Active	Architecture	Context	License	Open?
Mistral Large 3	675B / 41B	Granular sparse MoE	256K	Apache 2.0	Yes
Mistral Small 4 (24B)	24B dense	Dense, GQA	128K	Apache 2.0	Yes
Mistral Small 4 (119B)	119B	MoE + MLA	256K	Apache 2.0	Yes

Models on huggingface.co/mistralai.

What's next

I'm at the start of this and I don't know much yet. I don't know which of these I'd actually want to run for contract review, or which architectural choices matter most for that kind of work, or how big the gap is between what a model claims on a leaderboard and what it does on real legal text. The plan for the rest of this week is to read some of the foundational papers (the original Transformer paper, the DeepSeek V3 paper for MoE and MLA, the Llama 3 paper for a modern dense reference), look at what current benchmarks actually measure and where they fall short, and then run a small head-to-head of a few candidates against a contract-review eval set just to see what comes out. I'll write up what I learn.

This page is versioned. The next version will replace whatever's stale here without losing this snapshot. Previous versions stay accessible at /posts/open-weights-models/versions.