DeepSeek

DeepSeek V4

Open-weights reasoning model with Mixture-of-Experts architecture

Overall Highlight

Open-weights MoE with 671B params — self-hostable frontier reasoning

Overview

DeepSeek V4 is an open-weights frontier model using a Mixture-of-Experts (MoE) architecture, activating only a fraction of parameters per token for efficient inference. Known for exceptional reasoning capabilities, strong math performance, and the ability to run locally with sufficient hardware. The open-weights approach enables fine-tuning and self-hosting.

Capabilities

▸Mixture-of-Experts for efficient inference
▸Open-weights for self-hosting
▸Strong mathematical reasoning
▸Code generation and analysis
▸Fine-tunable for domain adaptation
▸Chain-of-thought reasoning

Use Cases

→Local/self-hosted AI inference
→Mathematical research and proofs
→Code generation with privacy
→Domain-specific fine-tuning
→Cost-effective batch processing

Version Breakdown

DeepSeek V4

2026 Q1

Context Window

128K tokens

Parameters

671B (37B active per token)

Release

2026 Q1

Highlights

▸MoE architecture — 37B active params per token
▸Open-weights with permissive license
▸Best-in-class math reasoning
▸Self-hostable on multi-GPU setups

Benchmarks

MMLU

87.5

HumanEval

89.8

GSM8K

96.2

MATH

82.1

DeepSeek V4 Lite

2026 Q1

Context Window

64K tokens

Parameters

236B (21B active per token)

Release

2026 Q1

Highlights

▸Compact MoE for single-GPU inference
▸Retains strong reasoning capabilities
▸Lower VRAM requirements
▸Good for edge deployment

Benchmarks

MMLU

82.1

HumanEval

85.3

GSM8K

91.0

MATH

72.4

DeepSeek V3

2025 Q4

Context Window

64K tokens

Parameters

671B (37B active per token)

Release

2025 Q4

Highlights

▸Previous generation — proven in production
▸Excellent open-weights ecosystem support
▸Strong coding and math performance
▸Widely fine-tuned by the community

Benchmarks

MMLU

84.0

HumanEval

86.5

GSM8K

93.2

MATH

75.8

← Back to all models