MiniMax unveils open source LLM with surprising 4M token context

Join our daily and weekly newsletters for the latest updates and exclusive content on industry leading AI coverage. Learn more

MiniMax is perhaps now best known here in the US as the Singaporean company behind it Hailuoa realistic, high-resolution generative AI video model that competes with trends, Sora at OpenAI and Old AI’s Dream Machine.

But the company has even more tricks up its sleeve: Today, for example, it announced the release and open-sourcing of MiniMax-01 seriesa new family of models created to handle higher context and improve AI agent development.

The series includes MiniMax-Text-01, a foundation large language model (LLM), and MiniMax-VL-01, a visual multi-modal model.

A large context window

MiniMax-Text-o1, a particular note for enabling up to 4 million tokens in its context window – equivalent to a small library worth of books. The context window is how much information the LLM can produce an input/output exchangewith words and concepts represented as numerical “tokens,” LLM’s own internal mathematical abstraction of the data it is trained on.

And, while Google previously led the pack with the Gemini 1.5 Pro model and 2 million token context windowsThe MiniMax remarkably duplicates that.

Like MiniMax Posted on its official X account today: “MiniMax-01 efficiently processes up to 4M tokens — 20 to 32 times the capacity of other leading models. We believe that the MiniMax-01 is ready to support the expected influx of agent-related applications in the coming year, as agents increasingly require additional capabilities to manage context and persistent memory.

The models are now available for download at Hugging the Face and Github under a custom MiniMax licensefor users to test directly on Good AI Chat (a competitor of ChatGPT/Gemini/Claude), and through MiniMax’s application programming interface (API)where third-party developers can link their own unique apps to them.

MiniMax offers APIs for text and multi-modal processing at competitive prices:

$0.2 per 1 million input tokens
$1.1 per 1 million output tokens

For comparison, the cost of OpenAI’s GPT-4o $2.50 per 1 million input tokens via its API, a staggering 12.5X more expensive.

MiniMax also integrates a mixed experts framework (MoE) with 32 experts to optimize scalability. This design balances computational and memory efficiency while maintaining competitive performance in key benchmarks.

Break new ground with Lightning Attention Architecture

At the heart of the MiniMax-01 is a Lightning Attention mechanism, a new alternative to the transformer architecture.

This design greatly reduces the computational complexity. The models contain 456 billion parameters, with 45.9 billion activated per inference.

Unlike previous architectures, Lightning Attention uses a mix of linear and traditional SoftMax layers, achieving near-linear complexity for high inputs. SoftMaxfor those like myself who are new to the concept, is to change the input numerals to probabilities increase by 1, so that the LLM can estimate what meaning of the input is likely.

MiniMax has rebuilt its training and inference frameworks to support the Lightning Attention architecture. The main improvements include:

MoE all-to-all communication optimization: Reduced inter-GPU communication overhead.
Varlen drew attention: Reduced computational waste for long sequence processing.
Efficient kernel implementations: Optimized CUDA kernels improve Lightning Attention performance.

These improvements make the MiniMax-01 models accessible for real-world applications, while maintaining affordability.

Performance and Benchmarks

In mainstream text and multi-modal benchmarks, MiniMax-01 rivals top-tier models such as GPT-4 and Claude-3.5, with very strong results in high-context reviews. Notably, MiniMax-Text-01 achieves 100% accuracy in Needle-In-A-Haystack task with a 4-million-token context.

The models also show a slight decrease in performance as input length increases.

MiniMax plans regular updates to expand the models capabilities, including code and multi-modal improvements.

The company sees open-sourcing as a step toward building foundational AI capabilities for the evolving AI agent landscape.

With 2025 predicted to be a transformative year for AI agents, the need for continuous memory and efficient inter-agent communication is growing. MiniMax innovations are designed to meet these challenges.

Open to collaboration

MiniMax invites developers and researchers to explore the capabilities of MiniMax-01. In addition to open-sourcing, its team welcomes technical suggestions and questions in collaboration with [email protected].

With its commitment to cost-effective and scalable AI, MiniMax has positioned itself as a key player in shaping the AI agent era. The MiniMax-01 series offers an exciting opportunity for developers to push the boundaries of what can be achieved in high context AI.

Daily insights into business use cases in VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory changes to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. See more VB newsletters here.

An error occurred.