SYNTHETIC-2: Planetary-Scale Pipeline Parallel Inference for Verified Reasoning

Today, we're excited to launch SYNTHETIC-2, our next-generation, open-source reasoning dataset and planetary-scale, pipeline-parallel distributed inference run.

Built on our globally-distributed inference stack and powered by the new DeepSeek-R1-0528 model, SYNTHETIC-2 generates verified reasoning traces spanning the most comprehensive set of complex reinforcement-learning tasks and verifiers released to date.

The run supports heterogeneous compute—letting everyone, from consumer GPUs to hyperscale NVIDIA and AMD clusters, to contribute towards frontier-level AGI research. Just spin up your GPUs and start helping us advance towards open-source superintelligence.

Contribute Compute

‍

Planetary-Scale Inference: Building a Distributed Inference Engine for the Public Internet

A few weeks ago we previewed our globally-distributed inference stack. Today that stack moves into production—fully integrated with:

prime-rl – our fault-tolerant, asynchronous distributed RL library
TOPLOC verifiable-computing proofs for pipeline parallel inference (+ v2, see below)

Frontier models such as DeepSeek-R1 with hundreds of billions of parameters do not fit into the GPU memory of a single GPU. With pipeline parallelism, instead of keeping the entire model on every GPU, we divide it into sequential stages. Each device—whether an H100 in a data center or a consumer RTX 4090 card—stores only its stage, processes its slice of the forward pass, and streams the activation to the next worker. This is enables us to run large models on consumer devices.

GPU States — **Pipeline Parallel Communication.** Each node sends hidden states to the next stage worker. The final device decodes the next token and sends it back to the first worker, and the cycle repeats.

GPU Schedule — Asynchronous micro-batched schedule.

TOPLOC v2

To trust results from thousands of nodes, we must verify their generations cheaply. Our TOPLOC verifiable inference work employs a compact locality sensitive hashing scheme for intermediate activations, which can detect unauthorized modifications to models, prompts, or precision.

TOPLOC v2 extends this scheme to the pipeline parallel inference setting:

Group-level reward: If the final output is correct, we treat it as evidence that all pipeline stages behaved honestly.
Blame assignment on failure: If a result fails verification, we replay proofs stage-by-stage, pinpoint the first faulty worker, reject the output, and remove the node.

Additionally, our original TOPLOC approach was able to verify that the computation until the last hidden state was done correctly, but it couldn’t yet detect changes in sampling behavior, such as speculative decoding or inputting arbitrary token sequences during the forward pass.

TOPLOC v2 introduces a novel approach to effectively address this last problem for fully verifiable inference. TOPLOC v2 utilizes reproducible Gumbel noises for categorical sampling, allowing verifiers to perform parallel estimation of the original token sampling significantly faster than the original inference, with a quantifiable margin of error. The approach is robust across diverse model parallel configurations, GPU types, and kernel implementations, providing flexibility in both hardware selection and deployment for inference and verification.

Our full arxiv paper release of TOPLOC v2’s sampling proof approach is coming in the next few weeks.

SYNTHETIC-2 Dataset

SYNTHETIC-2 consists of a large set of verifiable reasoning tasks as well as reasoning traces obtained from multiple models. This design serves two purposes:

Supervised Training Data: High quality reasoning traces are crucial as cold start SFT data for reasoning models as well as mid-training data for base models. Using DeepSeek-R1-0528, the strongest open reasoning model, we generate a large amount of such data that is verified for correctness.
Difficulty-Annotated RL Data: Previous work has shown that RL datasets have to be carefully filtered for difficulty as measured by the base model’s pass rate to obtain performance improvements. Using a variety of smaller models, we annotate our RL tasks with pass@k rates as a proxy for difficulty.

Beyond traditional mathematics and coding problems, SYNTHETIC-2 aims to cover highly diverse tasks to teach models reasoning skills that generalize better beyond mathematics and coding. By aggregating data from publicly available datasets, using existing research repositories, and designing several reasoning tasks that can be generated programmatically on our own, we collect more than 20 difficult reasoning tasks and implement verifiers for them inside of our framework prime-rl. These tasks range from games such as puzzles from reasoning-gym, through kernel engineering, to precise JSON-format adherence.

Apart from verifiable tasks, we collect tasks whose responses are not meant to be verifiable in a rule-based manner, but only through reward models. These prompts are specifically meant to generate diverse SFT data to avoid training on a task distribution that is too narrow. These non-verifiable tasks include problems such as critique fine-tuning or questions from public forums such as Reddit or Stack Exchange.

Task Distribution

Task Distribution List

We generate reasoning traces for all of our tasks using DeepSeek-R1-0528. Additionally, for all tasks that are verifiable programmatically, we generate reasoning data from the following models to annotate our dataset for difficulty:

Our full SYNTHETIC-2 tasks dataset is available on HuggingFace.

How To Contribute Compute

You can contribute using either method by clicking “Contribute Compute” on the SYNTHETIC-2 dashboard.

Contribute Compute

‍

Once your node joins the pool, it will automatically starting working in a group of other nodes on the highest throughput model, and it'll also show up on the map and the leaderboard — where your contributions will be tracked.

Nodes

Next Steps

Building on the launch of SYNTHETIC-2, our next step is to leverage the SYNTHETIC-2 tasks dataset as the foundation for our next distributed RL run. INTELLECT-2 has already shown that globally distributed reinforcement learning works—now it's time to demonstrate its promise as a novel scaling paradigm, unlocking even more compute and achieving state-of-the-art model performance. Since the INTELLECT-2 release, we've made significant improvements to the stability of asynchronous RL at large scale and are confident these improvements will lead to state-of-the-art reasoning models trained in a globally-distributed fashion.

To expand the diversity of our RL environment’s ecosystem, we will integrate the verifiers repository as our core library for crowdsourcing complex RL environments from the open-source community. More details on this soon!

Our goal is to introduce additional multi-turn and tool-use environments—especially for coding and autonomous research tasks—to unlock SOTA coding-agent capabilities with our INTELLECT-3 model.