Authors

Announcements

November 26, 2025

INTELLECT-3: A 100B+ MoE trained with large-scale RL

Today, we release INTELLECT-3, a 100B+ parameter Mixture-of-Experts model trained on our RL stack, achieving state-of-the-art performance for its size across math, code, science and reasoning benchmarks, outperforming many larger frontier models.

Our complete recipe — from the model weights and training frameworks, to our datasets, RL environments, and evaluations — has been open-sourced, with the goal of encouraging more open research on large scale reinforcement learning.

INTELLECT-3 is trained on the same software and infrastructure that we’re open-sourcing and making available on our platform at Prime Intellect, giving everyone the tools to post-train their own state-of-the-art models, and moving us towards a future where every company can be an AI company.

Chat with INTELLECT-3 at chat.primeintellect.ai

Benchmarks

INTELLECT-3 is a 106B parameter Mixture-of-Experts model trained with both SFT and RL on top of the GLM 4.5 Air base model. It achieves state-of-the-art performance for its size across math, code, science and reasoning benchmarks.

Training Infrastructure

We leverage the following infrastructure components for training:

PRIME-RL: Our custom asynchronous RL framework powering both supervised fine-tuning and large-scale reinforcement learning of Mixture-of-Experts models.
Verifiers and the Environments Hub: A unified environment interface and ecosystem foragentic RL training environments and evaluations.
Prime Sandboxes: High-throughput, secure code execution for agentic coding environ-ments.
Compute Orchestration: Orchestrated and managed 512 NVIDIA H200 GPUs across 64 interconnected nodes.

PRIME-RL

INTELLECT-3 was trained end-to-end with prime-rl, our production-scale post-training frame-work. prime-rl provides native integration with verifiers environments, which power our entire post-training stack from synthetic data generation, supervised fine-tuning, reinforcement learning, to evaluations. Through its tight connection to the Environments Hub, the entire training stack can seamlessly access a rapidly expanding ecosystem of training and eval environments.

The sharpest distinction between prime-rl and many other RL trainers is that it is async-only — we recognized fairly early (for our previous INTELLECT-2 model) that the future of RL is async; i.e. always a few steps off-policy. Async training is simply the only practical way to efficiently scale RL to long-horizon agentic rollouts without incurring bottlenecks based on the slowest rollouts per step.

Asynchronous Off-Policy Training. We show the execution graph of one-step off-policy training in an idealized setting where the trainer step time equals the inference step time.

Over the past six months we’ve focused heavily on ablations for performance, stability, and efficiency at scale. INTELLECT-3 is a culmination of our work, scaling RL to train a 100B+ parameter Mixture-of-Experts model on 512 NVIDIA H200 GPUs. For more details on how our trainer works, see our technical report.

We will soon be releasing a hosted entrypoint to prime-rl as part of our upcoming Lab platform, enabling large-scale RL training without the infrastructure overhead.

Verifiers & Environments Hub

We train INTELLECT-3 using environments built with our verifiers library and hosted on the Environments Hub, our community hub for RL environments and evaluations.

verifiers is the leading open-source toolkit for creating RL environments and evaluations for LLMs. It provides a set of modular and extensible components for concisely expressing complex environment logic while maintaining highly scalable performance.

Most RL frameworks tightly couple environments into the training repo, making versioning, ablations, and external contributions cumbersome. The Environments Hub decouples this by publishing verifier-backed environments as standalone, pinnable Python modules with a uniform entry point. This allows tasks to be versioned, shared, and iterated on independently.

The Environments Hub hosts hundreds of environments across math, science, reasoning, as well as agentic tasks across coding, deep research and more.

All the environments and evaluations used for INTELLECT-3 are publicly available for others to use on the Environments Hub. For more details on the specific environments, as well as how they integrate with our trainer, see our technical report.

Prime Sandboxes

We scaled and upgraded our custom Sandboxes infrastructure specifically for agentic RL.

Executing untrusted code for thousands of concurrent rollouts requires a container orchestration layer capable of sub-second provisioning and millisecond-level execution latency. While Kubernetes provides the primitives for container management, standard architectural patterns are insufficient for the throughput required by high-velocity training.

To overcome these limitations, we built Prime Sandboxes: a fully redesigned, high-performance execution layer that bypasses the Kubernetes control plane, delivers near–local-process latency through a direct Rust-to-pod execution path, achieves sub-10-second startup at massive concurrency, and scales to hundreds of isolated sandboxes per node.

Within verifiers, we overlap sandbox provisioning with first-turn model reasoning to completely hide the startup time before code execution is required.

For full details on how we built and optimized our sandboxes for training INTELLECT-3, see our technical report.

Compute Orchestration

We deployed 512 NVIDIA H200 GPUs across 64 interconnected nodes. The primary engineering challenge lies in maintaining determinism and synchronization across a distributed system prone to hardware failures.

Provisioning: Infrastructure-as-Code via Ansible, automatic hardware discovery, and pre-run InfiniBand checks that isolate slow or faulty nodes.
Orchestration: Slurm + Cgroup v2 ensures clean job teardown and prevents leftover processes from blocking GPU memory.
Storage: Lustre for high-throughput training I/O, and NVMe-backed NFS for fast metadata and seamless SSH access.
Observability: DCGM + Prometheus monitoring lets us catch errors early and drain unstable nodes before they impact training.

INTELLECT-3 Training Recipe

INTELLECT-3 was trained in two main stages: a supervised fine-tuning stage on top of the GLM-4.5-Air base model, and a large-scale RL stage. Both stages, including multiple ablations, were carried out on a 512-GPU H200 cluster over the course of two months.

We trained on a diverse and challenging mix of RL environments designed to enhance the reasoning and agentic capabilities of our model. All RL environments we use are publicly available on the Environments Hub. We include the following categories: Math, Code, Science, Logic, Deep Research, and Software Engineering

Reasoning benchmark scores as the training progresses. The benchmarks scores generally trend up and do not appear to have reached a plateau.

Standardized and validated implementations of all benchmarks are available on the Environments Hub. Full details on our training recipe can be found in our technical report.

Resources

We open-source INTELLECT-3, our training framework RL prime-rl, and all environments used for synthetic data generation, training, and evaluation.

Technical Report: storage.googleapis.com/intellect-3-paper/INTELLECT_3_Technical_Report.pdf
Model Weights: huggingface.co/PrimeIntellect/INTELLECT-3
PRIME-RL: github.com/PrimeIntellect-ai/prime-rl
Verifiers: github.com/PrimeIntellect-ai/verifiers
Environments: hub.primeintellect.ai

Future Work

We’re looking forward to extending this work to:

Scaling Agentic RL: For the INTELLECT-3 checkpoint we release, rewards and evaluations continue to rise, and training remains stable. We are continuing to train with an increased emphasis on agentic environments in our RL mixture, and we expect to observe additional gains across a wider range of tasks in a subsequent release.
Richer RL Environments: The Environments Hub now has 500+ tasks across research, computer use, theorem proving, automation, and specialized domains. INTELLECT-3 used only a small fraction; the next step is scaling RL across a broader, higher-quality set of community environments.
Long-Horizon Agents: We’re making long-horizon behavior trainable via RL by allowing the model manage its own context—cutting context, branching, and maintaining lightweight external memory—so it can learn end-to-end context handling. So, we’re also looking towards environments that reward long-horizon reasoning directly.

Chat with INTELLECT-3

Chat with the model at chat.primeintellect.ai or via our Inference API.
‍
Thank you to Parasail and Nebius for being inference providers of the model.

Towards Open Superintelligence

We believe in a future where every company can be an AI company.

Where we can have a flourishing ecosystem of startups and companies building their own models. Where we can democratize access and ownership in the upside of the Intelligence Era, rather than being dependent on a set of obscure models hidden behind the APIs of companies that explicitly want to win and own it all.

Prime Intellect is building the open superintelligence stack, putting the tools to train frontier models into your hands. INTELLECT-3 serves as an example that you don’t need to be from the big labs to train models that compete in the big leagues.

We’re excited to see what you build with INTELLECT-3.

‍

Eurus-2-PRIME

System Prompt

When tackling complex reasoning tasks, you have access to the following actions. Use them as needed to progress through your thought process. [ASSESS] [ADVANCE] [VERIFY] [SIMPLIFY] [SYNTHESIZE] [PIVOT] [OUTPUT] You should strictly follow the format below: [ACTION NAME] # Your action step 1 # Your action step 2 # Your action step 3 ... Next action: [NEXT ACTION NAME]

User

Quadratic polynomials $P(x)$ and $Q(x)$ have leading coefficients $2$ and $-2,$ respectively. The graphs of both polynomials pass through the two points $(16,54)$ and $(20,53).$ Find $P(0) + Q(0).$

Eurus-2-PRIME

[ASSESS]

We are given two quadratic polynomials $P(x)$ and $Q(x)$ with leading coefficients $2$ and $-2$, respectively. The graphs of both polynomials pass through the points $(16,54)$ and $(20,53)$. We need to find the value of $P(0) + Q(0)$.

[ADVANCE]

Let's express the quadratic polynomials $P(x)$ and $Q(x)$ in the general form:
$ P(x) = 2x^2 + bx + c $
$ Q(x) = -2x^2 + dx + e $

Since the graphs of both polynomials pass through the points $(16,54)$ and $(20,53)$, we can substitute these points into the equations of $P(x)$ and $Q(x)$ to form a system of equations.

For $P(x)$:
$ P(16) = 2(16)^2 + 16b + c = 54 $
$ 512 + 16b + c = 54 $
$ 16b + c = -458 \quad \text{(Equation 1)} $

$ P(20) = 2(20)^2 + 20b + c = 53 $
$ 800 + 20b + c = 53 $
$ 20b + c = -747 \quad \text{(Equation 2)} $

For $Q(x)$:
$ Q(16) = -2(16)^2 + 16d + e = 54 $
$ -512 + 16d + e = 54 $
$ 16d + e = 566 \quad \text{(Equation 3)} $

$ Q(20) = -2(20)^2 + 20d + e = 53 $
$ -800 + 20d + e = 53 $
$ 20d + e = 853 \quad \text{(Equation 4)} $

[ADVANCE]

Now, we can solve the system of equations for $P(x)$:
Subtract Equation 1 from Equation 2:
$ (20b + c) - (16b + c) = -747 - (-458) $
$ 4b = -289 $
$ b = -\frac{289}{4} $

Substitute $b = -\frac{289}{4}$ into Equation 1:
$ 16\left(-\frac{289}{4}\right) + c = -458 $
$ -1156 + c = -458 $
$ c = 698 $

So, for $P(x)$, we have:
$ P(x) = 2x^2 - \frac{289}{4}x + 698 $
$ P(0) = 698 $

Next, we can solve the system of equations for $Q(x)$:
Subtract Equation 3 from Equation 4:
$ (20d + e) - (16d + e) = 853 - 566 $
$ 4d = 287 $
$ d = \frac{287}{4} $

Substitute $d = \frac{287}{4}$ into Equation 3:
$ 16\left(\frac{287}{4}\right) + e = 566 $
$ 1148 + e = 566 $
$ e = -582 $

So, for $Q(x)$, we have:
$ Q(x) = -2x^2 + \frac{287}{4}x - 582 $
$ Q(0) = -582 $

[ADVANCE]

Finally, we can find $P(0) + Q(0)$:
$ P(0) + Q(0) = 698 + (-582) = 116 $

So, the value of $P(0) + Q(0)$ is $\boxed{116}$.

INTELLECT-MATH-7B-SFT (ours)

System Prompt

Solve the following math problem efficiently and clearly. Think carefully and step by step about your response and reason before providing a final response. Conclude your response with: \n\nTherefore, the final answer is: $\\boxed{answer}$. I hope it is correct.\n\nWhere [answer] is just the final number or expression that solves the problem.

User

Quadratic polynomials $P(x)$ and $Q(x)$ have leading coefficients $2$ and $-2,$ respectively. The graphs of both polynomials pass through the two points $(16,54)$ and $(20,53).$ Find $P(0) + Q(0).$

INTELLECT-MATH-7B-SFT

So I've got this problem here about quadratic polynomials $P(x)$ and $Q(x)$. $P(x)$ has a leading coefficient of $2$, and $Q(x)$ has a leading coefficient of $-2$. Both polynomials pass through the points $(16,54)$ and $(20,53)$. I need to find $P(0) + Q(0)$.

First, since both $P(x)$ and $Q(x)$ are quadratic polynomials, I can express them in the general form:

$P(x) = 2x^2 + bx + c$

$Q(x) = -2x^2 + dx + e$

Wait, actually, the leading coefficient is the coefficient of the highest power of $x$, which for a quadratic is $x^2$. So yes, $P(x) = 2x^2 + bx + c$ and $Q(x) = -2x^2 + dx + e$.

Now, both polynomials pass through the points $(16,54)$ and $(20,53)$. That means when $x=16$, $P(16)=54$, and when $x=20$, $P(20)=53$. Similarly for $Q(x)$.

So, I can set up equations using these points.

First, for $P(x)$:

$P(16) = 2(16)^2 + b16 + c = 54$

Similarly, $P(20) = 2(20)^2 + b20 + c = 53$

Let me calculate $16^2$ and $20^2$:

$16^2 = 256$

$20^2 = 400$

So,

$P(16): 2256 + 16b + c = 54 \Rightarrow 512 + 16b + c = 54 \Rightarrow 16b + c = 54 - 512 \Rightarrow 16b + c = -458$

$P(20): 2400 + 20b + c = 53 \Rightarrow 800 + 20b + c = 53 \Rightarrow 20b + c = 53 - 800 \Rightarrow 20b + c = -747$

Now, I have two equations:

1) $16b + c = -458$

2) $20b + c = -747$

I can subtract equation 1 from equation 2 to eliminate $c$:

$(20b + c) - (16b + c) = -747 - (-458)$

$20b + c - 16b - c = -747 + 458$

$4b = -289$

So, $b = -289 / 4 = -72.25$

Wait, but coefficients are usually integers, but maybe not. Let me check.

Now, plug $b$ back into one of the equations to find $c$. Let's use equation 1:

$16(-72.25) + c = -458$

$-1156 + c = -458$

$c = -458 + 1156 = 698$

So, $P(x) = 2x^2 - 72.25x + 698$

Hmm, $P(0)$ would be $c$, which is $698$.

Now, let's do the same for $Q(x)$.

$Q(16) = -2*(16)^2 + d16 + e = 54 \Rightarrow -2256 + 16d + e = 54 \Rightarrow -512 + 16d + e = 54 \Rightarrow 16d + e = 54 + 512 \Rightarrow 16d + e = 566$

$Q(20) = -2*(20)^2 + d20 + e = 53 \Rightarrow -2400 + 20d + e = 53 \Rightarrow -800 + 20d + e = 53 \Rightarrow 20d + e = 53 + 800 \Rightarrow 20d + e = 853$

Now, equations for $Q(x)$:

1) $16d + e = 566$

2) $20d + e = 853$

Subtract equation 1 from equation 2:

$(20d + e) - (16d + e) = 853 - 566$

$4d = 287$

$d = 287 / 4 = 71.75$

Now, plug $d$ back into equation 1:

$1671.75 + e = 566 \Rightarrow 1148 + e = 566 \Rightarrow e = 566 - 1148 = -582$

So, $Q(x) = -2x^2 + 71.75x - 582$

Therefore, $Q(0) = e = -582$

Now, $P(0) + Q(0) = 698 + (-582) = 116$

Wait, but the problem says "present the answer in LaTeX format: \boxed{Your answer}"

So, I should write $\boxed{116}$

But let me double-check my calculations to make sure I didn't make any mistakes.

First, for $P(x)$:

$P(16) = 2(256) + 16b + c = 512 + 16b + c = 54 \Rightarrow 16b + c = -458$

$P(20) = 2*(400) + 20b + c = 800 + 20b + c = 53 \Rightarrow 20b + c = -747$

Subtracting: $4b = -289 \Rightarrow b = -72.25$

Then, $c = -458 - 16*(-72.25) = -458 + 1156 = 698$

Similarly for $Q(x)$:

$Q(16) = -2*(256) + 16d + e = -512 + 16d + e = 54 \Rightarrow 16d + e = 566$

$Q(20) = -2*(400) + 20d + e = -800 + 20d + e = 53 \Rightarrow 20d + e = 853$

Subtracting: $4d = 287 \Rightarrow d = 71.75$

Then, $e = 566 - 16*71.75 = 566 - 1148 = -582$

So, $P(0) = 698$, $Q(0) = -582$, sum is $116$.

Seems correct.

Final Answer

$\boxed{116}$

INTELLECT-MATH-7B (ours)

System Prompt

User

Quadratic polynomials $P(x)$ and $Q(x)$ have leading coefficients $2$ and $-2,$ respectively. The graphs of both polynomials pass through the two points $(16,54)$ and $(20,53).$ Find $P(0) + Q(0).$

INTELLECT-MATH-7B

So I have two quadratic polynomials, $P(x)$ and $Q(x)$, with leading coefficients 2 and -2, respectively. Both pass through the points $(16,54)$ and $(20,53)$. I need to find $P(0) + Q(0)$.

First, since both are quadratic polynomials, I can express them in the general form:

$P(x) = 2x² + bx + c$

$Q(x) = -2x² + dx + e$

But since both pass through the points $(16,54)$ and $(20,53)$, I can set up equations using these points.

For $P(x)$:

$P(16) = 2*(16)² + b \cdot 16 + c = 54$

$P(20) = 2(20)² + b \cdot 20 + c = 53$

Similarly for $Q(x)$:

$Q(16) = -2(16)² + d \cdot 16 + e = 54$

$Q(20) = -2(20)² + d \cdot 20 + e = 53$

Let's compute the squares:

$16² = 256$

$20² = 400$

So for $P(x)$:

$2256 + 16b + c = 54 \Rightarrow 512 + 16b + c = 54 \Rightarrow 16b + c = 54 - 512 = -458 \quad ...(1)$

$2400 + 20b + c = 53 \Rightarrow 800 + 20b + c = 53 \Rightarrow 20b + c = 53 - 800 = -747 \quad ...(2)$

For $Q(x)$:

$-2256 + 16d + e = 54 \Rightarrow -512 + 16d + e = 54 \Rightarrow 16d + e = 54 + 512 = 566 \quad ...(3)$

$-2400 + 20d + e = 53 \Rightarrow -800 + 20d + e = 53 \Rightarrow 20d + e = 53 + 800 = 853 \quad ...(4)$

Now, I have two systems of linear equations:

For $P(x)$:

$16b + c = -458 \quad ...(1)$

$20b + c = -747 \quad ...(2)$

Subtract equation (1) from equation (2):

$(20b + c) - (16b + c) = -747 - (-458)$

$4b = -289 \Rightarrow b = -\frac{289}{4}$

Now plug $b$ back into equation (1):

$16\left(-\frac{289}{4}\right) + c = -458 \Rightarrow -1156 + c = -458 \Rightarrow c = -458 + 1156 = 698$

So $P(x) = 2x² - \frac{289}{4}x + 698$

For $Q(x)$:

$16d + e = 566 \quad ...(3)$

$20d + e = 853 \quad ...(4)$

Subtract equation (3) from equation (4):

$(20d + e) - (16d + e) = 853 - 566$

$4d = 287 \Rightarrow d = \frac{287}{4}$

Now plug $d$ back into equation (3):

$16\left(\frac{287}{4}\right) + e = 566 \Rightarrow 1148 + e = 566 \Rightarrow e = 566 -1148= -582$

So $Q(x)=-2x^{2}+\frac{287}{4}x-582$.

Now, $P(0)=c=698$.

$Q(0)=e=-582$.

Therefore, $P(0)+Q(0)=698+(-582)=116$.

Final Answer

$\boxed{116}$.

Authors

Prime Intellect Team

Acknowledgements

Scaling Our Open-Source Environments Program

Today, we're scaling up our open-source environments program to become the global hub for open evals and RL environments.As part of this, we're committing hundreds of thousands of $ in grants and looking for partners who want to join our mission to accelerate open superintelligence.

Environments Hub: A Community Hub To Scale RL To Open AGI

RL environments are the playgrounds where agents learn. Until now, they’ve been fragmented, closed, and hard to share. We are launching the Environments Hub to change that: an open, community-powered platform that gives environments a true home.Environments define the world, rules and feedback loop of state, action and reward. From games to coding tasks to dialogue, they’re the contexts where AI learns, without them, RL is just an algorithm with nothing to act on.

SYNTHETIC-2 Release: Four Million Collaboratively Generated Reasoning Traces

We are releasing SYNTHETIC-2, an open dataset of four million verified reasoning traces spanning the most comprehensive set of complex reinforcement learning tasks and verifiers released to date. The dataset was collaboratively generated by compute contributors across the globe via our pipeline-parallel decentralized inference. Over 1,250 GPUs joined in 3 days — from 4090s to H200s — creating data for complex RL tasks.

INTELLECT-3: A 100B+ MoE trained with large-scale RL

Benchmarks

Training Infrastructure

PRIME-RL

Verifiers & Environments Hub

Prime Sandboxes

Compute Orchestration

INTELLECT-3 Training Recipe

Resources

Future Work

Chat with INTELLECT-3

Towards Open Superintelligence

Scaling Our Open-Source Environments Program

Environments Hub: A Community Hub To Scale RL To Open AGI

SYNTHETIC-2 Release: Four Million Collaboratively Generated Reasoning Traces

Related Posts

Scaling Our Open-Source Environments Program

Environments Hub: A Community Hub To Scale RL To Open AGI

SYNTHETIC-2 Release: Four Million Collaboratively Generated Reasoning Traces