LLM··11 min read

The Open-Weights Models You Can Actually Download

A snapshot of who's shipping what in open-weights LLMs right now: parameter counts, architecture, context length, license, and whether the weights are really open or just open with friction.

Ram BakthavachalamRam Bakthavachalam
The Open-Weights Models You Can Actually Download

I'm starting a 30-day deep dive into open-weights models.

Goal is to learn all the technical details of what it takes to get access to a model, host it in a server, measure the performance, fine tune them, monitor and solve real customer problems using these llms.

Before I run any of these locally or pick favorites, I want a map of the territory: who's shipping, what they're shipping, how big it is, and what license it ships under. This post is that map.

It's a versioned post. I'll refresh it as new models drop, and you'll see previous snapshots stack up at /posts/open-weights-models/versions.

The customer scenario I'm anchoring on

For the next 30 days, I am filtering every decision in this series through one customer scenario: contract review for a mid-size law firm. Pulling obligations and risks out of long contracts. Flagging unusual clauses against a playbook. Drafting a first-pass review memo that a partner can sign off on after edits. The scenario is made up, not a real customer, but the constraints are real.

This choice of legal contracts review enables me to focus on a list of factors that i want to explore in detail.

  • Long context, because contracts run into the hundreds of pages and a deal often comes with a stack of related documents.
  • Strict instruction-following, because legal language needs the exact words, not a paraphrase.
  • Low tolerance for hallucination, because a confident wrong summary of an obligation is worse than no summary at all. And a strong preference for on-prem deployment, because the documents are confidential and most firms will not let them go through a third-party API.

It also rules out some of the things that i want to defer exploring to a later stage. I don't need a coding model. I don't need 5M-token context windows. 256K is enough for almost any contract bundle. Multimodality is nice but not needed for now.

The same map below will look different when the filter is "fix Python bugs" or "summarize a doctor's note." But for a legal-tech reader, this is the lens. Which models survive the filter is a question for next week's post. This one is just the territory.

What "open weights" actually means

A model is open weights if you can download the parameter file and run it on your own machine. This is a smaller claim than "open source." Open source usually means code plus weights plus training data plus recipe, all under a permissive license. Most of the models below ship just the weights.

There is also another distinction that is less obvious. Some weights are downloadable, but only after you accept a custom license, agree not to use the model in certain regions, or wait for human approval on Hugging Face. This is open with friction. The post points out which is which.

The landscape in one paragraph

Chinese labs are moving fastest. DeepSeek, Moonshot, Zhipu, Alibaba (Qwen), MiniMax, and Tencent are all shipping frontier-scale checkpoints under or . Western labs are slower and more careful. Mistral and IBM are fully open. NVIDIA uses its own permissive license. OpenAI's gpt-oss release from August 2025 is still its only open contribution. Meta keeps Llama behind a community license with gated access. On the architecture side, MoE is now standard at scale, dense models are still common at small sizes, and hybrid attention is the new default (Mamba mixed with transformer, or compressed and sparse attention variants). 1M-token context is now common at the top of the lineup.

Chinese providers

DeepSeek

DeepSeek has been the fastest to ship new versions on the open side. V4 Pro, released April 24 2026, is a 1.6T-parameter MoE that activates 49B per token. V4 Flash is the smaller version. Both ship under MIT and download without gating.

ModelTotal / ActiveArchitectureContextLicenseOpen?
V4 Pro1.6T / 49BMoE, hybrid CSA + HCA attention1MMITYes
V4 Flash284B / 13BMoE, hybrid CSA + HCA attention1MMITYes

Models on huggingface.co/deepseek-ai.

Moonshot

Kimi K2.6 is the latest from Moonshot, released April 21 2026. The pitch is long-horizon coding agents: K2.6 can run agent swarms with hundreds of sub-agents and thousands of coordinated steps.

ModelTotal / ActiveArchitectureContextLicenseOpen?
Kimi K2.61T / 32BMoE (384 experts), MLA256KModified MITYes

Model on huggingface.co/moonshotai/Kimi-K2.6.

Zhipu (Z.ai)

Heads up on the version numbers. GLM-4.7 doesn't exist. The GLM line went 4.6 (Sep 2025) to 5 (Feb 2026) to 5.1 (Apr 2026). GLM-5 was the first frontier-scale model trained end to end on Huawei Ascend hardware.

ModelTotal / ActiveArchitectureContextLicenseOpen?
GLM-5744B / ~40BMoE, DeepSeek Sparse Attention128KMITYes
GLM-5.1744B-class MoEMoE128KMITYes

Models on huggingface.co/zai-org.

Alibaba (Qwen)

The current Qwen flagship is Qwen3.5-397B-A17B (Feb 2026), with Qwen3.6 mid-tier variants shipping in April 2026. There is no Qwen 4 yet. Qwen3-Coder is the dedicated coding line, with a "Next" variant joining the older 480B model.

ModelTotal / ActiveArchitectureContextLicenseOpen?
Qwen3.5-397B-A17B403B / 17BMoE + Gated DeltaNet262K (1M extensible)Apache 2.0Yes
Qwen3.6-35B-A3B36B / 3BMoE + DeltaNet256KApache 2.0Yes
Qwen3-Coder-Next80B / 3BMoE256KApache 2.0Yes
Qwen3-Coder-480B480B / 35BMoE256KApache 2.0Yes

Models on huggingface.co/Qwen.

MiniMax

MiniMax shipped M2.5 in February 2026 and M2.7 in March 2026. M2.7 ships under a custom MiniMax license, not Apache or MIT. Read the LICENSE file before using it commercially.

ModelTotal / ActiveArchitectureContextLicenseOpen?
M2.7229B / ~10BMoE(sparse on card)CustomYes (custom terms)
M2.5230B / 10BMoE, MLA-style long contextLong-contextCustomYes (custom terms)

Models on huggingface.co/MiniMaxAI.

Tencent

Tencent rebranded its open release line in 2026. Hunyuan 2.0 (Dec 2025) was the previous flagship. Hy3-preview (April 23 2026) is the current one. The license is not Apache or MIT. Tencent's Hy Community License has commercial-use clauses you should read before deploying.

ModelTotal / ActiveArchitectureContextLicenseOpen?
Hy3-preview295B / 21BMoE (192 experts)256KTencent Hy CommunityYes (with terms)

Model on huggingface.co/tencent/Hy3-preview.

American providers

Meta

Llama 4 Maverick (~400B) and Scout (~109B) shipped April 2025 and are still widely used. Llama 5 was announced April 8 2026, but the open-weights status is the same as Llama 4: gated on Hugging Face behind a custom community license, not Apache or MIT. Treat Llama as open with friction.

ModelTotal / ActiveArchitectureContextLicenseOpen?
Llama 5~600B+MoEUp to 5MLlama 5 CommunityGated
Llama 4 Maverick~400B / 17BMoE (128 experts)1MLlama 4 CommunityGated
Llama 4 Scout~109B / 17BMoE (16 experts)10MLlama 4 CommunityGated

Models on huggingface.co/meta-llama.

IBM

Granite 4.0-H-Small (Oct 2025) is the most interesting one. A hybrid Mamba-2 plus transformer MoE that gets about 70% memory reduction on long-context workloads. IBM's whole pitch is on-prem and edge.

ModelTotal / ActiveArchitectureContextLicenseOpen?
Granite 4.0-H-Small32B / 9BHybrid Mamba-2 + Transformer MoE128KApache 2.0Yes

Models on huggingface.co/ibm-granite.

Google

Gemma 4 (April 2 2026) is the current Gemma generation, with E2B/E4B variants for mobile and a 31B dense flagship. Gemma 3n is still maintained for on-device use cases (Pi-class hardware).

ModelTotal / ActiveArchitectureContextLicenseOpen?
Gemma 4 (31B)31BDense256KApache 2.0 (verify)Yes
Gemma 4 (26B MoE)26BMoE256KApache 2.0 (verify)Yes
Gemma 3n E4B8B / 4B effectiveMatFormer, Per-Layer Embeddings32KGemma termsYes

Models on huggingface.co/google.

NVIDIA

Nemotron 3 Super (GA March 11 2026) is a hybrid Mamba-Transformer MoE with native NVFP4. Nemotron 3 Ultra has a base checkpoint open but no instruction-tuned variant yet. Both ship under NVIDIA's own permissive Open Model License.

ModelTotal / ActiveArchitectureContextLicenseOpen?
Nemotron 3 Super120.6B / 12.7BHybrid Mamba-Transformer MoE, NVFP41MNVIDIA OMLYes
Nemotron 3 Ultra (base)550B / ~55BHybrid Mamba-Transformer MoE1MNVIDIA OMLBase only

Models on research.nvidia.com/labs/nemotron.

OpenAI

OpenAI's only open-weights release is gpt-oss from August 2025. There has been no refresh since. The 120B fits on a single H100 (80GB), the 20B fits on 16GB consumer hardware.

ModelTotal / ActiveArchitectureContextLicenseOpen?
gpt-oss-120B117B / 5.1BMoE (128 experts)128KApache 2.0Yes
gpt-oss-20B21B / 3.6BMoE (32 experts)128KApache 2.0Yes

Models on huggingface.co/openai.

European providers

Mistral

Mistral Large 3 (Dec 2025) is Mistral's first MoE since Mixtral, and the only frontier-scale open MoE in the list outside the Chinese labs. Mistral Small 4 (Mar 2026) combines Magistral, Pixtral, and Devstral into one line, with both a 24B dense and a 119B MoE variant.

ModelTotal / ActiveArchitectureContextLicenseOpen?
Mistral Large 3675B / 41BGranular sparse MoE256KApache 2.0Yes
Mistral Small 4 (24B)24B denseDense, GQA128KApache 2.0Yes
Mistral Small 4 (119B)119BMoE + MLA256KApache 2.0Yes

Models on huggingface.co/mistralai.

What's next

I'm at the start of this and I don't know much yet. I don't know which of these I'd actually want to run for contract review, or which architectural choices matter most for that kind of work, or how big the gap is between what a model claims on a leaderboard and what it does on real legal text. The plan for the rest of this week is to read some of the foundational papers (the original Transformer paper, the DeepSeek V3 paper for MoE and MLA, the Llama 3 paper for a modern dense reference), look at what current benchmarks actually measure and where they fall short, and then run a small head-to-head of a few candidates against a contract-review eval set just to see what comes out. I'll write up what I learn.

This page is versioned. The next version will replace whatever's stale here without losing this snapshot. Previous versions stay accessible at /posts/open-weights-models/versions.