DeepSeek V4
Open-weights reasoning model with Mixture-of-Experts architecture
Overall Highlight
Open-weights MoE with 671B params — self-hostable frontier reasoning
Overview
DeepSeek V4 is an open-weights frontier model using a Mixture-of-Experts (MoE) architecture, activating only a fraction of parameters per token for efficient inference. Known for exceptional reasoning capabilities, strong math performance, and the ability to run locally with sufficient hardware. The open-weights approach enables fine-tuning and self-hosting.
Capabilities
- ▸Mixture-of-Experts for efficient inference
- ▸Open-weights for self-hosting
- ▸Strong mathematical reasoning
- ▸Code generation and analysis
- ▸Fine-tunable for domain adaptation
- ▸Chain-of-thought reasoning
Use Cases
- →Local/self-hosted AI inference
- →Mathematical research and proofs
- →Code generation with privacy
- →Domain-specific fine-tuning
- →Cost-effective batch processing
Version Breakdown
DeepSeek V4
2026 Q1Context Window
128K tokens
Parameters
671B (37B active per token)
Release
2026 Q1
Highlights
- ▸MoE architecture — 37B active params per token
- ▸Open-weights with permissive license
- ▸Best-in-class math reasoning
- ▸Self-hostable on multi-GPU setups
Benchmarks
MMLU
87.5
HumanEval
89.8
GSM8K
96.2
MATH
82.1
DeepSeek V4 Lite
2026 Q1Context Window
64K tokens
Parameters
236B (21B active per token)
Release
2026 Q1
Highlights
- ▸Compact MoE for single-GPU inference
- ▸Retains strong reasoning capabilities
- ▸Lower VRAM requirements
- ▸Good for edge deployment
Benchmarks
MMLU
82.1
HumanEval
85.3
GSM8K
91.0
MATH
72.4
DeepSeek V3
2025 Q4Context Window
64K tokens
Parameters
671B (37B active per token)
Release
2025 Q4
Highlights
- ▸Previous generation — proven in production
- ▸Excellent open-weights ecosystem support
- ▸Strong coding and math performance
- ▸Widely fine-tuned by the community
Benchmarks
MMLU
84.0
HumanEval
86.5
GSM8K
93.2
MATH
75.8