DeepSeek V4 Architecture Verified: 3 of 4 Predictions Hit, Engram Module Absent

Gate News message, April 24 — DeepSeek released the V4 model card today, validating earlier architectural predictions made through analysis of the TileKernels kernel library released yesterday (April 23). According to monitoring by Beating, three core components were confirmed: mHC (Manifold-Constrained Hyper-Connections) replacing ByteDance’s original HyperConnection, MoE architecture with Top-k expert routing, and FP4+FP8 mixed-precision weight storage. The predicted Engram conditional memory module did not appear in the model card.

The model card revealed new components not covered in TileKernels: hybrid attention mechanisms (CSA + HCA) drive V4’s long-context efficiency gains, reducing inference FLOPs to just 27% of V3.2’s level at 1M context windows and KV cache to 10%. Training now uses the Muon optimizer.

The verification demonstrates how production-level kernel implementations can reveal underlying model architecture before official specifications are published.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

Tencent open-sourced Hy3 preview version, code benchmark tests improved by 40% over the previous generation

Tencent officially open-sourced the Hy3 preview version of a large language model on April 23 on GitHub, Hugging Face, and the ModelScope platform, and simultaneously provided paid API service via Tencent Cloud. According to Decrypt’s report on April 24, the Hy3 preview version began training in late January and reached the publication calendar in less than three months.

MarketWhisper6m ago

FTX Portfolio Investments Worth 158 Trillion Won If Not Bankrupt

FTX, the centralized cryptocurrency exchange that filed for Chapter 11 bankruptcy protection in November 2022 due to liquidity shortages and capital outflows, would have held investments valued at approximately 158.796 trillion won if it had not collapsed, according to analysis cited by Park

CryptoFrontier9m ago

Xiaomi Reveals MiMo-V2-Pro Training Details: 1T Model Parameters, Thousands of GPUs Deployed

Gate News message, April 24 — Xiaomi's large language model team lead Luo Fuli disclosed in an in-depth interview that the MiMo-V2-Pro model has 1 trillion parameters in total and required thousands of GPUs for training. She noted that the 1T scale represents the minimum threshold to achieve

GateNews23m ago

DeepSeek V4 Achieves Perfect Score on Putnam-2025, Ties with Axiom in Formal Math Reasoning

Gate News message, April 24 — DeepSeek V4 has published results from formal mathematical reasoning evaluations, achieving a perfect score of 120/120 on Putnam-2025, tying with Axiom for first place. In the practical regime using LeanExplore and constrained sampling, V4-Flash-Max scored 81.00 on the

GateNews31m ago

Which AI makes you look most prestigious? Research shows Claude users’ income far exceeds its peers, while Meta AI comes in at the bottom

Epoch AI’s research shows that Claude users are mostly higher-income groups, with 80% earning more than $100k per year. Meta AI has the widest income distribution, with 36.5% earning over $100k, and the largest share is in the low-income bracket. As Claude’s pricing increases and tiered billing is introduced, costs may rise, while Meta is easier to enter. Which AI you use in the future may become an implicit identity label.

ChainNewsAbmedia36m ago

V4-Pro Achieves 67% Coding Pass Rate in Internal Dogfooding Test, Approaching Opus 4.5 Performance

Gate News message, April 24 — V4 has publicly disclosed internal dogfooding data for its V4-Pro model. The company collected approximately 200 real-world engineering tasks from over 50 engineers, covering feature development, bug fixes, refactoring, and diagnostics across tech stacks including

GateNews50m ago
Comment
0/400
No comments