I'm starting a 30-day deep dive into open-weights models.
Goal is to learn all the technical details of what it takes to get access to a model, host it in a server, measure the performance, fine tune them, monitor and solve real customer problems using these llms.
Before I run any of these locally or pick favorites, I want a map of the territory: who's shipping, what they're shipping, how big it is, and what license it ships under. This post is that map.
It's a versioned post. I'll refresh it as new models drop, and you'll see previous snapshots stack up at /posts/open-weights-models/versions.
The customer scenario I'm anchoring on
For the next 30 days, I am filtering every decision in this series through one customer scenario: contract review for a mid-size law firm. Pulling obligations and risks out of long contracts. Flagging unusual clauses against a playbook. Drafting a first-pass review memo that a partner can sign off on after edits. The scenario is made up, not a real customer, but the constraints are real.
This choice of legal contracts review enables me to focus on a list of factors that i want to explore in detail.
- Long context, because contracts run into the hundreds of pages and a deal often comes with a stack of related documents.
- Strict instruction-following, because legal language needs the exact words, not a paraphrase.
- Low tolerance for hallucination, because a confident wrong summary of an obligation is worse than no summary at all. And a strong preference for on-prem deployment, because the documents are confidential and most firms will not let them go through a third-party API.
It also rules out some of the things that i want to defer exploring to a later stage. I don't need a coding model. I don't need 5M-token context windows. 256K is enough for almost any contract bundle. Multimodality is nice but not needed for now.
The same map below will look different when the filter is "fix Python bugs" or "summarize a doctor's note." But for a legal-tech reader, this is the lens. Which models survive the filter is a question for next week's post. This one is just the territory.
What "open weights" actually means
A model is open weights if you can download the parameter file and run it on your own machine. This is a smaller claim than "open source." Open source usually means code plus weights plus training data plus recipe, all under a permissive license. Most of the models below ship just the weights.
There is also another distinction that is less obvious. Some weights are downloadable, but only after you accept a custom license, agree not to use the model in certain regions, or wait for human approval on Hugging Face. This is open with friction. The post points out which is which.
The landscape in one paragraph
Chinese labs are moving fastest. DeepSeek, Moonshot, Zhipu, Alibaba (Qwen), MiniMax, and Tencent are all shipping frontier-scale checkpoints under or . Western labs are slower and more careful. Mistral and IBM are fully open. NVIDIA uses its own permissive license. OpenAI's gpt-oss release from August 2025 is still its only open contribution. Meta keeps Llama behind a community license with gated access. On the architecture side, MoE is now standard at scale, dense models are still common at small sizes, and hybrid attention is the new default (Mamba mixed with transformer, or compressed and sparse attention variants). 1M-token context is now common at the top of the lineup.
Chinese providers
DeepSeek
DeepSeek has been the fastest to ship new versions on the open side. V4 Pro, released April 24 2026, is a 1.6T-parameter MoE that activates 49B per token. V4 Flash is the smaller version. Both ship under MIT and download without gating.
| Model | Total / Active | Architecture | Context | License | Open? |
|---|---|---|---|---|---|
| V4 Pro | 1.6T / 49B | MoE, hybrid CSA + HCA attention | 1M | MIT | Yes |
| V4 Flash | 284B / 13B | MoE, hybrid CSA + HCA attention | 1M | MIT | Yes |
Models on huggingface.co/deepseek-ai.
Moonshot
Kimi K2.6 is the latest from Moonshot, released April 21 2026. The pitch is long-horizon coding agents: K2.6 can run agent swarms with hundreds of sub-agents and thousands of coordinated steps.
| Model | Total / Active | Architecture | Context | License | Open? |
|---|---|---|---|---|---|
| Kimi K2.6 | 1T / 32B | MoE (384 experts), MLA | 256K | Modified MIT | Yes |
Model on huggingface.co/moonshotai/Kimi-K2.6.
Zhipu (Z.ai)
Heads up on the version numbers. GLM-4.7 doesn't exist. The GLM line went 4.6 (Sep 2025) to 5 (Feb 2026) to 5.1 (Apr 2026). GLM-5 was the first frontier-scale model trained end to end on Huawei Ascend hardware.
| Model | Total / Active | Architecture | Context | License | Open? |
|---|---|---|---|---|---|
| GLM-5 | 744B / ~40B | MoE, DeepSeek Sparse Attention | 128K | MIT | Yes |
| GLM-5.1 | 744B-class MoE | MoE | 128K | MIT | Yes |
Models on huggingface.co/zai-org.
Alibaba (Qwen)
The current Qwen flagship is Qwen3.5-397B-A17B (Feb 2026), with Qwen3.6 mid-tier variants shipping in April 2026. There is no Qwen 4 yet. Qwen3-Coder is the dedicated coding line, with a "Next" variant joining the older 480B model.
| Model | Total / Active | Architecture | Context | License | Open? |
|---|---|---|---|---|---|
| Qwen3.5-397B-A17B | 403B / 17B | MoE + Gated DeltaNet | 262K (1M extensible) | Apache 2.0 | Yes |
| Qwen3.6-35B-A3B | 36B / 3B | MoE + DeltaNet | 256K | Apache 2.0 | Yes |
| Qwen3-Coder-Next | 80B / 3B | MoE | 256K | Apache 2.0 | Yes |
| Qwen3-Coder-480B | 480B / 35B | MoE | 256K | Apache 2.0 | Yes |
Models on huggingface.co/Qwen.
MiniMax
MiniMax shipped M2.5 in February 2026 and M2.7 in March 2026. M2.7 ships under a custom MiniMax license, not Apache or MIT. Read the LICENSE file before using it commercially.
| Model | Total / Active | Architecture | Context | License | Open? |
|---|---|---|---|---|---|
| M2.7 | 229B / ~10B | MoE | (sparse on card) | Custom | Yes (custom terms) |
| M2.5 | 230B / 10B | MoE, MLA-style long context | Long-context | Custom | Yes (custom terms) |
Models on huggingface.co/MiniMaxAI.
Tencent
Tencent rebranded its open release line in 2026. Hunyuan 2.0 (Dec 2025) was the previous flagship. Hy3-preview (April 23 2026) is the current one. The license is not Apache or MIT. Tencent's Hy Community License has commercial-use clauses you should read before deploying.
| Model | Total / Active | Architecture | Context | License | Open? |
|---|---|---|---|---|---|
| Hy3-preview | 295B / 21B | MoE (192 experts) | 256K | Tencent Hy Community | Yes (with terms) |
Model on huggingface.co/tencent/Hy3-preview.
American providers
Meta
Llama 4 Maverick (~400B) and Scout (~109B) shipped April 2025 and are still widely used. Llama 5 was announced April 8 2026, but the open-weights status is the same as Llama 4: gated on Hugging Face behind a custom community license, not Apache or MIT. Treat Llama as open with friction.
| Model | Total / Active | Architecture | Context | License | Open? |
|---|---|---|---|---|---|
| Llama 5 | ~600B+ | MoE | Up to 5M | Llama 5 Community | Gated |
| Llama 4 Maverick | ~400B / 17B | MoE (128 experts) | 1M | Llama 4 Community | Gated |
| Llama 4 Scout | ~109B / 17B | MoE (16 experts) | 10M | Llama 4 Community | Gated |
Models on huggingface.co/meta-llama.
IBM
Granite 4.0-H-Small (Oct 2025) is the most interesting one. A hybrid Mamba-2 plus transformer MoE that gets about 70% memory reduction on long-context workloads. IBM's whole pitch is on-prem and edge.
| Model | Total / Active | Architecture | Context | License | Open? |
|---|---|---|---|---|---|
| Granite 4.0-H-Small | 32B / 9B | Hybrid Mamba-2 + Transformer MoE | 128K | Apache 2.0 | Yes |
Models on huggingface.co/ibm-granite.
Gemma 4 (April 2 2026) is the current Gemma generation, with E2B/E4B variants for mobile and a 31B dense flagship. Gemma 3n is still maintained for on-device use cases (Pi-class hardware).
| Model | Total / Active | Architecture | Context | License | Open? |
|---|---|---|---|---|---|
| Gemma 4 (31B) | 31B | Dense | 256K | Apache 2.0 (verify) | Yes |
| Gemma 4 (26B MoE) | 26B | MoE | 256K | Apache 2.0 (verify) | Yes |
| Gemma 3n E4B | 8B / 4B effective | MatFormer, Per-Layer Embeddings | 32K | Gemma terms | Yes |
Models on huggingface.co/google.
NVIDIA
Nemotron 3 Super (GA March 11 2026) is a hybrid Mamba-Transformer MoE with native NVFP4. Nemotron 3 Ultra has a base checkpoint open but no instruction-tuned variant yet. Both ship under NVIDIA's own permissive Open Model License.
| Model | Total / Active | Architecture | Context | License | Open? |
|---|---|---|---|---|---|
| Nemotron 3 Super | 120.6B / 12.7B | Hybrid Mamba-Transformer MoE, NVFP4 | 1M | NVIDIA OML | Yes |
| Nemotron 3 Ultra (base) | 550B / ~55B | Hybrid Mamba-Transformer MoE | 1M | NVIDIA OML | Base only |
Models on research.nvidia.com/labs/nemotron.
OpenAI
OpenAI's only open-weights release is gpt-oss from August 2025. There has been no refresh since. The 120B fits on a single H100 (80GB), the 20B fits on 16GB consumer hardware.
| Model | Total / Active | Architecture | Context | License | Open? |
|---|---|---|---|---|---|
| gpt-oss-120B | 117B / 5.1B | MoE (128 experts) | 128K | Apache 2.0 | Yes |
| gpt-oss-20B | 21B / 3.6B | MoE (32 experts) | 128K | Apache 2.0 | Yes |
Models on huggingface.co/openai.
European providers
Mistral
Mistral Large 3 (Dec 2025) is Mistral's first MoE since Mixtral, and the only frontier-scale open MoE in the list outside the Chinese labs. Mistral Small 4 (Mar 2026) combines Magistral, Pixtral, and Devstral into one line, with both a 24B dense and a 119B MoE variant.
| Model | Total / Active | Architecture | Context | License | Open? |
|---|---|---|---|---|---|
| Mistral Large 3 | 675B / 41B | Granular sparse MoE | 256K | Apache 2.0 | Yes |
| Mistral Small 4 (24B) | 24B dense | Dense, GQA | 128K | Apache 2.0 | Yes |
| Mistral Small 4 (119B) | 119B | MoE + MLA | 256K | Apache 2.0 | Yes |
Models on huggingface.co/mistralai.
What's next
I'm at the start of this and I don't know much yet. I don't know which of these I'd actually want to run for contract review, or which architectural choices matter most for that kind of work, or how big the gap is between what a model claims on a leaderboard and what it does on real legal text. The plan for the rest of this week is to read some of the foundational papers (the original Transformer paper, the DeepSeek V3 paper for MoE and MLA, the Llama 3 paper for a modern dense reference), look at what current benchmarks actually measure and where they fall short, and then run a small head-to-head of a few candidates against a contract-review eval set just to see what comes out. I'll write up what I learn.
This page is versioned. The next version will replace whatever's stale here without losing this snapshot. Previous versions stay accessible at /posts/open-weights-models/versions.
