Gate News message, April 24 — DeepSeek has released the V4 series of open-source models under the MIT License, with weights now available on Hugging Face and ModelScope. The series includes two mixture-of-experts (MoE) models: V4-Pro with 1.6 trillion total parameters and 49 billion activated per token, and V4-Flash with 284 billion total parameters and 13 billion activated per token. Both support a 1 million token context window.
The architecture features three key upgrades: a hybrid attention mechanism combining compressed sparse attention (CSA) and heavily compressed attention (HCA) that significantly reduces long-context overhead—V4-Pro’s inference FLOPs for 1M context is just 27% of V3.2’s, and KV cache (VRAM for storing historical information during inference) is only 10% of V3.2’s; manifold-constrained hyperconnections (mHC) replacing traditional residual connections to enhance cross-layer signal propagation stability; and the Muon optimizer for faster training convergence. Pre-training used over 32 trillion tokens of data.
Post-training employs a two-stage approach: first training domain-specific experts via supervised fine-tuning (SFT) and GRPO reinforcement learning, then merging them into a single model through online distillation. V4-Pro-Max (highest inference mode) claims to be the strongest open-source model with top-tier coding benchmarks and significantly narrowed gaps with closed-source frontier models on reasoning and agent tasks. V4-Flash-Max achieves Pro-level reasoning performance with sufficient compute budget but is limited by parameter scale on pure knowledge and complex agent tasks. Weights are stored in mixed FP4+FP8 precision.
Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to
Disclaimer.
Related Articles
DeepSeek V4 Achieves Perfect Score on Putnam-2025, Ties with Axiom in Formal Math Reasoning
Gate News message, April 24 — DeepSeek V4 has published results from formal mathematical reasoning evaluations, achieving a perfect score of 120/120 on Putnam-2025, tying with Axiom for first place.
In the practical regime using LeanExplore and constrained sampling, V4-Flash-Max scored 81.00 on the
GateNews8m ago
Which AI makes you look most prestigious? Research shows Claude users’ income far exceeds its peers, while Meta AI comes in at the bottom
Epoch AI’s research shows that Claude users are mostly higher-income groups, with 80% earning more than $100k per year. Meta AI has the widest income distribution, with 36.5% earning over $100k, and the largest share is in the low-income bracket. As Claude’s pricing increases and tiered billing is introduced, costs may rise, while Meta is easier to enter. Which AI you use in the future may become an implicit identity label.
ChainNewsAbmedia13m ago
V4-Pro Achieves 67% Coding Pass Rate in Internal Dogfooding Test, Approaching Opus 4.5 Performance
Gate News message, April 24 — V4 has publicly disclosed internal dogfooding data for its V4-Pro model. The company collected approximately 200 real-world engineering tasks from over 50 engineers, covering feature development, bug fixes, refactoring, and diagnostics across tech stacks including
GateNews28m ago
UK Eyes Anthropic AI to Strengthen Banking Cybersecurity
The United Kingdom is exploring a major step in financial cybersecurity by working with AI firm Anthropic. Early discussions suggest the government may deploy Anthropic’s advanced Claude Mythos model across banks and financial institutions. This move aims to improve defenses as cyber threats
CryptometerIo32m ago
China's IP Office Adds AI, Semiconductors, and Brain-Computer Interfaces to Fast-Track Protection Program
Gate News message, April 24 — China's National Intellectual Property Administration announced on April 24 that it will establish comprehensive intellectual property protection for emerging technologies through institutional reforms, enhanced services, and expanded applications. The administration wi
GateNews39m ago