Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Supporting China's open-source memory revolution, AI finally gains human-level long-term memory!
With around 100 million tokens of context, a 4-billion-parameter small model outperforms 235-billion RAG! EverMind's open-source MSA has caused a stir.
Have you ever wondered: humans have a memory capacity of about 200-300 million tokens in a lifetime, but today’s GPT and Claude can barely handle 200K-1M, and crash beyond that? No matter how many vector databases you stack, they can't save it. Retrieval is always an external plugin; multi-hop reasoning forgets everything once interrupted; training long-context models consumes exorbitant GPU memory, and inference is painfully slow.
EverMind-AI hits hard, directly smashing through the ceiling. They open-sourced MSA (Memory Sparse Attention), a truly native, built-in, end-to-end trainable long-term memory architecture, pushing LLM’s memory capacity directly to 100 million tokens, with less than 9% accuracy decay!
This isn’t just another long-context trick; it’s a revolutionary design that directly welds the hippocampus into the Transformer.
//
What makes MSA so powerful? Three tricks to beat all predecessors instantly
1. Sparse Attention + Document-wise RoPE
Traditional RoPE drifts in position when dealing with ultra-long sequences. MSA resets position counts independently for each document, enabling seamless extrapolation from 64K to 100M tokens during training. Complexity shifts from O(n²) to approximately O(n), making training and inference linearly scalable.
2. Hierarchical KV Caching + Memory Parallelism
Routing keys (highly compressed version) reside permanently on GPU, while complete KV pairs are stored in CPU memory. During inference, only the top-k relevant documents are fetched—just two A800 GPUs can handle 100M tokens! Official tests show throughput skyrocketing.
3. Memory Interleave Mechanism
No longer just a one-time retrieval; the model iterates its thinking: generate → retrieve → generate again → retrieve again. It dynamically decides how many documents to consider. Multi-hop reasoning (HotpotQA, 2Wiki, etc.) is revived, and ablation experiments show removing it causes a 19%+ drop in accuracy.
In one sentence: MSA fully integrates memory and reasoning into a differentiable closed loop, transforming the process from “look up info then answer” to “think while recalling.” This is the memory approach AGI should have. Data doesn’t lie: 4B models blow away everything else.
The official backbone is Qwen3-4B-Instruct. Compared to similarly scaled RAG, top RAG stacks, HippoRAG2, etc.:
• Average long-context QA score: MSA leads the same backbone RAG by 16%, top RAG stacks by 11.5%.
• MS MARCO (over 70 million tokens): MSA scores 4.141, far surpassing RAG series.
• Multi-hop datasets (HotpotQA, 2Wiki): even more impressive advantage.
• NIAH (needle in a haystack) 1M token: traditional models drop below 25%, MSA maintains over 94% accuracy.
• From 16K to 100M tokens: accuracy decay is less than 9%, while other methods have long since plummeted.
Even more astonishing: a 4B MSA model outperforms RAG systems with 60 times more parameters. This means future agents won’t need 200B+ monster models; just add MSA, and they’ll have memory close to a human lifetime.
The EverMind team clearly regards enabling agents to have personal memory as their core mission, and MSA is their first gift to the world.
GitHub open-source: