Yuan 3.0: Multimodal Foundation Model

Yuan 3.0 Flash

40B Parameters

3.7B Activated

MoE Architecture Dual-Mode CoT Compute-Efficient Enterprise-Ready

Yuan 3.0 is a multimodal large model based on MoE architecture. It supports multimodal inputs including text, images, tables and documents, and demonstrates leading performance in key enterprise-level scenarios such as RAG, complex table understanding, and long document analysis and summary generation.

75%↓

Inference tokens

It improves accuracy in math, science and complex reasoning tasks, while cutting inference tokens count by up to 75%, significantly reducing inference costs.

Model Capabilities

Designed for real enterprise needs, the system excels in multiple enterprise evaluations.

Core Advantages

Innovative architecture delivering superior performance and efficiency.

Compute-efficient

Leveraging the advantages of MoE architecture and RAPO reinforcement learning algorithm, it achieves an optimal balance between high performance and low computing cost .

SOTA Performance

It delivers outstanding performance in core tasks such as enterprise-level multimodal understanding, and leads mainstream large models in overall capabilities.

Business-ready

Deeply aligned with diverse enterprise scenarios, seamlessly supporting text processing, multimodal understanding, and knowledge-based Q&A workflows.

Precise RAG

Precisely aligns business documents with user intent, significantly improving accuracy and reliability in RAG-powered knowledge base applications.

Deep Multimodal Insight

Supports deep analysis across text, images, and documents, enabling enterprises to efficiently unlock hidden value from complex data.

100% Open-source

Open-source and free to use, with support for self-hosted deployment, secondary development, and deep customization—significantly reducing enterprise adoption costs.

Performance Benchmarks

Excels in enterprise-level long document analysis, cross-page information retrieval, and multi-source knowledge fusion tasks.

In multiple enterprise-level evaluations, the model demonstrated strong performance in RAG, complex table and document understanding, high-quality summary generation, and multimodal reasoning efficiency.

Technical Architecture

Fused model algorithm + training optimization for high-precision, high-efficiency, enterprise-ready multimodal performance.

Efficient MoE Architecture Design

Introduces an Attention Router that models inter-expert collaboration using multi-vector representations, enabling better expert selection and improved collaborative reasoning accuracy.

Localized Filtering-based Attention(LFA)

Replaces uniform token attention with a “local-first, global-complement” design, enhancing local dependencies before global context modeling to improve semantic accuracy while reducing parameters and memory overhead—well suited to Chinese word order and relational semantics.

Unified Multimodal Fusion Mechanism

Adopts a unified multimodal encoder–decoder architecture.Images of arbitrary resolution are encoded and tokenized, then concatenated with text tokens at the sequence level, enabling efficient end-to-end multimodal understanding and generation.

Pre-training: Strong Specialization, Expanded Coverage

General-domain content is reduced, while professional corpora such as textbooks, papers, and code are emphasized. Paired image–text data is introduced to jointly strengthen language and visual foundations.

Fine-tuning: Reasoning and Practical Use

Building on CoT data, reflection and verification samples are added to enhance self-checking ability. Multimodal tasks cover real-world scenarios including research, programming, and office workflows, further reinforcing domain expertise.

Reinforcement Learning: Order & Standard

Under the RAPO framework, the model enables unified training of deep and fast reasoning tasks with stable, non-collapsing performance. Each data sample is accompanied by verifiable reference answers or scoring scripts that reward both results and reasoning processes, continuously improving the model’s reliability.

Post-training Reinforcement Learning

Yuan3.0 Flash introduces RAPO, which addresses DAPO over-sampling through adaptive dynamic sampling (ADS) and high-entropy token selection 80/20 strategy, cutting training time by 52.91% with maintained efficiency and accuracy.

Post-training Process Supervision

The Reflective Inhibition Reward Mechanism (RIRM) identifies critical nodes for correct answers and suppresses redundant reasoning, significantly shortening output length while ensuring accuracy.

Open Source & Commercial Use

Fully open-source and free for commercial use—no licensing required.

Fully Open Source

Open-source model weights, docs, and training methods for community-driven development.

Free Commercial Use

Free to use for commercial projects and product deployment.

No Licensing Required

Download and deploy instantly, drastically reducing deployment cycles.

Community Support

The developer community offers technical Q&A, experience sharing, and best practices.