SYNTHETIC-1 Release: Two Million Collaboratively Generated Reasoning Traces from Deepseek-R1
We are releasing SYNTHETIC-1, the largest open reasoning dataset generated from Deepseek-R1, collaboratively generated by compute contributors across the globe. SYNTHETIC-1 contains reasoning traces for a diverse range of tasks across math, coding and science, whose correctness was confirmed with task-specific verifiers.
Along with the raw data including both correct and incorrect reasoning traces with rich metadata, we are publishing a curated SFT subset with 900k samples, making it the largest SFT dataset from R1 to date, as well as a preference tuning dataset obtained from correct and incorrect responses to the same problems. We also release SYNTHETIC-1-SFT-7B and demonstrate that our dataset is highly effective at teaching models reasoning through supervised fine-tuning.
SFT on synthetic data is merely the first step towards strong reasoning models. Next, we will go beyond synthetic data generation and are preparing for a distributed reinforcement learning run with verifiable rewards in a globally distributed setting with anyone being able to contribute compute.
Task Dataset Construction
To generate SYNTHETIC-1, we built Genesys, an open-source library that implements verifiers for several tasks ranging from unit test execution for code generation to LLM judges for open-ended question answering. Genesys is designed to be easily extendable, with the goal of encouraging the community to contribute their own verifiers and establish a standardized framework for building and integrating verifiers into synthetic data generation and reinforcement learning pipelines.
For SYNTHETIC-1, Genesys was built to support five distinct tasks whose solutions were judged with four different verifiers:
Mathematics Problems (777k samples):
Tasks: Competition-Level Math Problems from NuminaMath, with LLM-based post-processing to turn multiple-choice questions into free form questions and to filter out questions without automatically verifiable responses (e.g. questions asking for proofs)
Verifier: Symbolic verification based on the math-verify library
Tasks: Algorithmic Challenges from coding competitions and platforms such as Leetcode, curated from Apps, Codecontests, Codeforces and TACO datasets. LLM-based post-processing was applied to additionally translate Python problems into Javascript, Rust and C++ problems
Tasks: Derived from real-world GitHub commits in the CommitPack dataset. Each problem pairs a pre-commit code file with an LLM-generated modification instruction, crafted using context from the original commit message and the post-commit file state.
Verifier: An LLM judge compares LLM-generated code against the actual post-commit file state.
Tasks: Questions curated from a broad range of technical and scientific topics using the StackExchange dataset. LLM-based filtering retains only those questions with objectively correct responses, excluding opinion-based queries, and only keeps questions that require genuine reasoning rather than simple recall or memorization of information.
Verifier: An LLM judge scores responses by comparing them to the most upvoted answer.
Tasks: Fully synthetic task where the goal is to predict the output of code that performs string transformations given the code and some string input. We generate arbitrary string-processing functions via LLM prompting and recursively increase their complexity using a scheme akin to evol-instruct. Inputs include both random strings and snippets from news articles, with ground truth outputs obtained by executing the generated code.
Verifier: LLM-predicted output strings are directly compared with real output strings and are judged as correct when an exact match occurs.
In total, we sample and verify 2M responses across the full dataset of tasks. We use globally community-contributed 8xH200 nodes to host the 671B parameter R1 model and use the sampling settings recommended by the Deepseek Team (see here) for generation. For efficiency reasons, we limit the number of tokens per response to 12k, as we find that less than 5% of responses hit this length, and most of them are wrong as the model tends to fall into repetitive patterns, as also observed by other researchers.
The raw 2M dataset containing all reasoning traces, including those judged as wrong by verifiers, are available under PrimeIntellect/SYNTHETIC-1. In addition to the raw data, we apply post-processing to obtain a 900k sample SFT dataset and 11k sample preference tuning dataset.
SFT Dataset
To curate our SFT dataset, we keep all responses from our 2M samples that are verified as correct or whose verifier-assigned score crosses a certain threshold. We use the following threshold for all verifiers.
Mathematics Problems: 1 (binary verifier)
Algorithmic Coding Problems: 0.9 (90% of tests passed)
Real-World Software Engineering Problems: 0.9 (90+ score by judge asked to score on 0-100 scale)
Open-Ended STEM Question Answering: 0.9 (90+ score by judge asked to score on 0-100 scale)
The final SFT dataset can be found under PrimeIntellect/SYNTHETIC-1-SFT-Data. We run initial SFT experiments training SYNTHETIC-1-7B-SFT and find that our data is able to strongly improve the reasoning performance of Qwen-2.5B-Instruct-7B. In particular, we observe that larger amounts of reasoning data for SFT tends to improve model performance: Bespoke-Stratos-7B, which was trained with 17k samples is outperformed by OpenThinker-7B trained on 114k samples, which is largely outperformed by SYNTHETIC-1-SFT-7B trained on 800k samples - we merely find that our model lags behind on python coding competitions, which we suspect is due to a lower ratio of python coding data in SYNTHETIC-1.
Preference Dataset
To curate a preference dataset, we select all tasks for which at least two responses have been generated and incorporate them if one of the responses passed as correct according to the criteria of the SFT dataset, and one does not or have a score of lower than 0.6 (judge score or ratio of unit tests passed). Doing so, we end up with 11.5k pairs of responses. The preference dataset can be found under PrimeIntellect/SYNTHETIC-1-Preference-Data.
Next Steps
Our SYNTHETIC-1 dataset release is just the first step towards scaling to state-of-the-art open reasoning models. Deepseek-R1 was trained with an initial cold-start SFT stage leveraging synthetic data, and a much larger and computationally expensive stage of reinforcement learning.
In the next weeks, we will announce INTELLECT-2, a globally decentralized collaborative reinforcement learning run, with the goal of training a state of the art reasoning model in the 30B-70B parameter range. This training run will be enabled by fully permissionless compute contributions via our protocol testnet and verified inference for the RL rollout stage using TOPLOC.
Eurus-2-PRIME
System Prompt
When tackling complex reasoning tasks, you have access to the following actions. Use them as needed to progress through your thought process. [ASSESS] [ADVANCE] [VERIFY] [SIMPLIFY] [SYNTHESIZE] [PIVOT] [OUTPUT] You should strictly follow the format below: [ACTION NAME] # Your action step 1 # Your action step 2 # Your action step 3 ... Next action: [NEXT ACTION NAME]
User
Quadratic polynomials $P(x)$ and $Q(x)$ have leading coefficients $2$ and $-2,$ respectively. The graphs of both polynomials pass through the two points $(16,54)$ and $(20,53).$ Find $P(0) + Q(0).$
Eurus-2-PRIME
[ASSESS]
We are given two quadratic polynomials $P(x)$ and $Q(x)$ with leading coefficients $2$ and $-2$, respectively. The graphs of both polynomials pass through the points $(16,54)$ and $(20,53)$. We need to find the value of $P(0) + Q(0)$.
[ADVANCE]
Let's express the quadratic polynomials $P(x)$ and $Q(x)$ in the general form:
$ P(x) = 2x^2 + bx + c $
$ Q(x) = -2x^2 + dx + e $
Since the graphs of both polynomials pass through the points $(16,54)$ and $(20,53)$, we can substitute these points into the equations of $P(x)$ and $Q(x)$ to form a system of equations.
For $P(x)$:
$ P(16) = 2(16)^2 + 16b + c = 54 $
$ 512 + 16b + c = 54 $
$ 16b + c = -458 \quad \text{(Equation 1)} $
$ P(20) = 2(20)^2 + 20b + c = 53 $
$ 800 + 20b + c = 53 $
$ 20b + c = -747 \quad \text{(Equation 2)} $
For $Q(x)$:
$ Q(16) = -2(16)^2 + 16d + e = 54 $
$ -512 + 16d + e = 54 $
$ 16d + e = 566 \quad \text{(Equation 3)} $
$ Q(20) = -2(20)^2 + 20d + e = 53 $
$ -800 + 20d + e = 53 $
$ 20d + e = 853 \quad \text{(Equation 4)} $
[ADVANCE]
Now, we can solve the system of equations for $P(x)$: Subtract Equation 1 from Equation 2:
$ (20b + c) - (16b + c) = -747 - (-458) $
$ 4b = -289 $
$ b = -\frac{289}{4} $
Substitute $b = -\frac{289}{4}$ into Equation 1:
$ 16\left(-\frac{289}{4}\right) + c = -458 $
$ -1156 + c = -458 $
$ c = 698 $
So, for $P(x)$, we have:
$ P(x) = 2x^2 - \frac{289}{4}x + 698 $
$ P(0) = 698 $
Next, we can solve the system of equations for $Q(x)$: Subtract Equation 3 from Equation 4:
$ (20d + e) - (16d + e) = 853 - 566 $
$ 4d = 287 $
$ d = \frac{287}{4} $
Substitute $d = \frac{287}{4}$ into Equation 3:
$ 16\left(\frac{287}{4}\right) + e = 566 $
$ 1148 + e = 566 $
$ e = -582 $
So, for $Q(x)$, we have:
$ Q(x) = -2x^2 + \frac{287}{4}x - 582 $
$ Q(0) = -582 $
[ADVANCE]
Finally, we can find $P(0) + Q(0)$:
$ P(0) + Q(0) = 698 + (-582) = 116 $
So, the value of $P(0) + Q(0)$ is $\boxed{116}$.
INTELLECT-MATH-7B-SFT (ours)
System Prompt
Solve the following math problem efficiently and clearly. Think carefully and step by step about your response and reason before providing a final response. Conclude your response with: \n\nTherefore, the final answer is: $\\boxed{answer}$. I hope it is correct.\n\nWhere [answer] is just the final number or expression that solves the problem.
User
Quadratic polynomials $P(x)$ and $Q(x)$ have leading coefficients $2$ and $-2,$ respectively. The graphs of both polynomials pass through the two points $(16,54)$ and $(20,53).$ Find $P(0) + Q(0).$
INTELLECT-MATH-7B-SFT
So I've got this problem here about quadratic polynomials $P(x)$ and $Q(x)$. $P(x)$ has a leading coefficient of $2$, and $Q(x)$ has a leading coefficient of $-2$. Both polynomials pass through the points $(16,54)$ and $(20,53)$. I need to find $P(0) + Q(0)$.
First, since both $P(x)$ and $Q(x)$ are quadratic polynomials, I can express them in the general form:
$P(x) = 2x^2 + bx + c$
$Q(x) = -2x^2 + dx + e$
Wait, actually, the leading coefficient is the coefficient of the highest power of $x$, which for a quadratic is $x^2$. So yes, $P(x) = 2x^2 + bx + c$ and $Q(x) = -2x^2 + dx + e$.
Now, both polynomials pass through the points $(16,54)$ and $(20,53)$. That means when $x=16$, $P(16)=54$, and when $x=20$, $P(20)=53$. Similarly for $Q(x)$.
So, I can set up equations using these points.
First, for $P(x)$:
$P(16) = 2(16)^2 + b16 + c = 54$
Similarly, $P(20) = 2(20)^2 + b20 + c = 53$
Let me calculate $16^2$ and $20^2$:
$16^2 = 256$
$20^2 = 400$
So,
$P(16): 2256 + 16b + c = 54 \Rightarrow 512 + 16b + c = 54 \Rightarrow 16b + c = 54 - 512 \Rightarrow 16b + c = -458$
$P(20): 2400 + 20b + c = 53 \Rightarrow 800 + 20b + c = 53 \Rightarrow 20b + c = 53 - 800 \Rightarrow 20b + c = -747$
Now, I have two equations:
1) $16b + c = -458$
2) $20b + c = -747$
I can subtract equation 1 from equation 2 to eliminate $c$:
$(20b + c) - (16b + c) = -747 - (-458)$
$20b + c - 16b - c = -747 + 458$
$4b = -289$
So, $b = -289 / 4 = -72.25$
Wait, but coefficients are usually integers, but maybe not. Let me check.
Now, plug $b$ back into one of the equations to find $c$. Let's use equation 1:
$16(-72.25) + c = -458$
$-1156 + c = -458$
$c = -458 + 1156 = 698$
So, $P(x) = 2x^2 - 72.25x + 698$
Hmm, $P(0)$ would be $c$, which is $698$.
Now, let's do the same for $Q(x)$.
$Q(16) = -2*(16)^2 + d16 + e = 54 \Rightarrow -2256 + 16d + e = 54 \Rightarrow -512 + 16d + e = 54 \Rightarrow 16d + e = 54 + 512 \Rightarrow 16d + e = 566$
$Q(20) = -2*(20)^2 + d20 + e = 53 \Rightarrow -2400 + 20d + e = 53 \Rightarrow -800 + 20d + e = 53 \Rightarrow 20d + e = 53 + 800 \Rightarrow 20d + e = 853$
Now, equations for $Q(x)$:
1) $16d + e = 566$
2) $20d + e = 853$
Subtract equation 1 from equation 2:
$(20d + e) - (16d + e) = 853 - 566$
$4d = 287$
$d = 287 / 4 = 71.75$
Now, plug $d$ back into equation 1:
$1671.75 + e = 566 \Rightarrow 1148 + e = 566 \Rightarrow e = 566 - 1148 = -582$
So, $Q(x) = -2x^2 + 71.75x - 582$
Therefore, $Q(0) = e = -582$
Now, $P(0) + Q(0) = 698 + (-582) = 116$
Wait, but the problem says "present the answer in LaTeX format: \boxed{Your answer}"
So, I should write $\boxed{116}$
But let me double-check my calculations to make sure I didn't make any mistakes.
First, for $P(x)$:
$P(16) = 2(256) + 16b + c = 512 + 16b + c = 54 \Rightarrow 16b + c = -458$
$P(20) = 2*(400) + 20b + c = 800 + 20b + c = 53 \Rightarrow 20b + c = -747$
$Q(16) = -2*(256) + 16d + e = -512 + 16d + e = 54 \Rightarrow 16d + e = 566$
$Q(20) = -2*(400) + 20d + e = -800 + 20d + e = 53 \Rightarrow 20d + e = 853$
Subtracting: $4d = 287 \Rightarrow d = 71.75$
Then, $e = 566 - 16*71.75 = 566 - 1148 = -582$
So, $P(0) = 698$, $Q(0) = -582$, sum is $116$.
Seems correct.
Final Answer
$\boxed{116}$
INTELLECT-MATH-7B (ours)
System Prompt
Solve the following math problem efficiently and clearly. Think carefully and step by step about your response and reason before providing a final response. Conclude your response with: \n\nTherefore, the final answer is: $\\boxed{answer}$. I hope it is correct.\n\nWhere [answer] is just the final number or expression that solves the problem.
User
Quadratic polynomials $P(x)$ and $Q(x)$ have leading coefficients $2$ and $-2,$ respectively. The graphs of both polynomials pass through the two points $(16,54)$ and $(20,53).$ Find $P(0) + Q(0).$
INTELLECT-MATH-7B
So I have two quadratic polynomials, $P(x)$ and $Q(x)$, with leading coefficients 2 and -2, respectively. Both pass through the points $(16,54)$ and $(20,53)$. I need to find $P(0) + Q(0)$.
First, since both are quadratic polynomials, I can express them in the general form:
$P(x) = 2x² + bx + c$
$Q(x) = -2x² + dx + e$
But since both pass through the points $(16,54)$ and $(20,53)$, I can set up equations using these points.
For $P(x)$:
$P(16) = 2*(16)² + b \cdot 16 + c = 54$
$P(20) = 2(20)² + b \cdot 20 + c = 53$
Similarly for $Q(x)$:
$Q(16) = -2(16)² + d \cdot 16 + e = 54$
$Q(20) = -2(20)² + d \cdot 20 + e = 53$
Let's compute the squares:
$16² = 256$
$20² = 400$
So for $P(x)$:
$2256 + 16b + c = 54 \Rightarrow 512 + 16b + c = 54 \Rightarrow 16b + c = 54 - 512 = -458 \quad ...(1)$
$2400 + 20b + c = 53 \Rightarrow 800 + 20b + c = 53 \Rightarrow 20b + c = 53 - 800 = -747 \quad ...(2)$
For $Q(x)$:
$-2256 + 16d + e = 54 \Rightarrow -512 + 16d + e = 54 \Rightarrow 16d + e = 54 + 512 = 566 \quad ...(3)$
$-2400 + 20d + e = 53 \Rightarrow -800 + 20d + e = 53 \Rightarrow 20d + e = 53 + 800 = 853 \quad ...(4)$
Now, I have two systems of linear equations:
For $P(x)$:
$16b + c = -458 \quad ...(1)$
$20b + c = -747 \quad ...(2)$
Subtract equation (1) from equation (2):
$(20b + c) - (16b + c) = -747 - (-458)$
$4b = -289 \Rightarrow b = -\frac{289}{4}$
Now plug $b$ back into equation (1):
$16\left(-\frac{289}{4}\right) + c = -458 \Rightarrow -1156 + c = -458 \Rightarrow c = -458 + 1156 = 698$
So $P(x) = 2x² - \frac{289}{4}x + 698$
For $Q(x)$:
$16d + e = 566 \quad ...(3)$
$20d + e = 853 \quad ...(4)$
Subtract equation (3) from equation (4):
$(20d + e) - (16d + e) = 853 - 566$
$4d = 287 \Rightarrow d = \frac{287}{4}$
Now plug $d$ back into equation (3):
$16\left(\frac{287}{4}\right) + e = 566 \Rightarrow 1148 + e = 566 \Rightarrow e = 566 -1148= -582$
Big shoutout to all our compute providers who collectively contributed 60,000 exaFLOPs of compute via our protocol testnet – we couldn’t have done it without you! @LambdaAPI @marloXBT @toptickcrypto @skre_0 Marco Giglio supervillian realtek klee 0xfr_ Virtual Machine @oldmankotaro @herb0x_ @mev_pete sto D32 Riccardo Iaconelli @rodeo_crypro primeprimeint1234 hecataeus @Etherean007 internal primary.