TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference

H2 Essay Title

Authors

Jackmin

Johannes

Announcements

January 28, 2025

TOPLOC - A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference

We introduce TOPLOC, a novel method for verifiable inference. TOPLOC employs a compact locality sensitive hashing mechanism for intermediate activations, which can detect unauthorized modifications to models, prompts, or compute precision with 100% accuracy in our empirical evaluations.

The system maintains robustness across diverse hardware configurations, GPU types, tensor parallel dimensions, and attention kernel implementations, while achieving validation speeds up to 100× faster than the original inference.

The polynomial encoding scheme used in TOPLOC reduces the memory overhead of generated commits by 1000×, requiring only 258 bytes of storage per 32 new tokens compared to the 262 KB required for storing token embeddings directly for Llama-3.1-8B-Instruct. This makes TOPLOC a practical solution for large-scale deployment.

TOPLOC is easy to implement in modern inference engines and efficient enough to generate commits for all model inferences performed by an untrusted compute provider with minimal overhead.

By enabling efficient verification of LLM inference computations, TOPLOC establishes a foundation for building decentralized, verifiable, and trustless compute protocols.
‍

Paper: https://arxiv.org/abs/2501.16007
‍

The Problem: Trust in LLM Inference

Inference providers often make adjustments to computation methods to optimize for cost, efficiency, or specific commercial goals. While these modifications can make inference more economical and scalable, they may also impact the quality, accuracy, and transparency of the service provided to users.

Lower precision: Inference providers might use lower precision formats which reduces compute and memory requirements.
KVCache compression: To enable faster and longer generations, providers may compress intermediate tensors.
Model Weight Alterations: Providers may distill, merge, or prune weights to reduce compute and memory requirements.
Altered prompt: Providers could modify the system prompt to align with their commercial goals, incorporate specific biases, or prioritize certain outcomes.

TOPLOC

The TOPLOC algorithm encodes and validates the most salient features of the last hidden state tensor using a compact, verifiable proof. During the generation phase, the inference provider commits to the top-k values in the last hidden state, which can later be verified by the validator by recomputing the last hidden states. Due to increased parallelization, the validation can be done a lot faster than the original generation. Because the last hidden state depends on previous ones, verifying the correctness of the last hidden state gives us confidence that the previous hidden states are correct as well.

During proof generation, the top-k indices and values are extracted from the tensor, and an injective modulus m is computed to uniquely map these indices. The indices and theircorresponding values are encoded into a polynomial, which, along with the modulus, forms the proof.

For validation, the proof is decoded to retrieve k, m, and the polynomial. The top-k features are recalculated and compared against the proof by checking for differences in the exponent and mantissa. Validation succeeds if all error thresholds are not exceeded.

Robustness across Different GPUs, Attention and Tensor Parallel Implementations

Why do we need to go through the trouble of measuring error rate and using thresholds? Since both computations are the same algebraically, can we not just directly take the equality of the resulting tensors?

We could if computers performed real number arithmetic precisely and deterministically. However, modern computers, with their digital limitations, perform arithmetic on real numbers using floating point arithmetic. Floating point arithmetic is sensitive to the order of computations — a + b + c does not always equal c + b + a. Because of this, algebraically reordering the computation, despite theoretically yielding equivalent results does not yield equivalent results in practice.

Algebraic reorderings of the computation can occur on GPUs for all sorts of reasons: different CUDA version, different GPU models, different tensor parallel configurations, kernel implementations and even just seemingly randomly at the whims of the GPUs scheduler. Rewriting the computation to increase parallelization and make the validation faster changes the order of computation and might even change the cuBLAS kernel that is used for the matrix multiplications.

This reordering becomes a big issue when it changes where catastrophic cancellations occur in the computation. Catastrophic cancellation occurs as a result of the addition of two floating point number of opposite sign that are very close in magnitude. Because bf16 only has 7 bits of precision, these cancellations can often lead to the appearance of exact zeros. Depending on the order of the operations, the location of these exact zeros will be affected.

Interestingly, this behaviour reveals a notable property: small values are more susceptible to rounding errors, whereas larger values tend to be represented consistently after algebraic reordering. This insight motivates us to focus on the larger values in a tensor when designing our hash function. By prioritizing these large values, we can reduce the impact of rounding errors and improve the robustness of the hash function.

Another notable property is that the difference in the mantissa when the exponents match is likely to be small. A mismatch in exponents after an algebraic reordering indicates that the sum is on the edge of overflowing in the precision and incrementing the exponent. The overflowed mantissa is likely very different from the non-overflowed mantissa. In our experiments, we found that the deviations in the mantissa are small when the exponent bits are matched, making the mantissa deviations a suitable threshold given we only take the difference when the exponents match.

In our experiments, we found a set of thresholds that are robust across different GPUs, attention and tensor parallel implementations while still being able to distinguishes different models, prompts and compute precision. More details on these experiments can be found in our paper.

Integration into Popular LLM Inference Engines

We provide a straightforward implementation of TOPLOC integrated with vLLM for efficient inference and validation: https://github.com/PrimeIntellect-ai/toploc.

Additionally, we maintain a fork of SGLang that includes a TOPLOC integration, enabling verifiable inference within SGLang's framework.

Challenges and Future Directions

Although TOPLOC is a significant leap in verifiable inference compared to existing methods, there still remains unsolved problems, challenges and future directions that could further improve the coverage and widespread use of TOPLOC for verifiable model computation:

Extending to model training: Currently, we have only explored using TOPLOC to verify LLM inference. The method could however also be extended to verify decentralized model training. Similar to inference, model training is non-deterministic due to GPU scheduling. Small perturbations in one step of the gradient descent changes the gradients for subsequent steps, leading to cascading differences. This cascading also occurs for LLM inference because the non-determinism will change the contents of the KV-cache, altering subsequent computation. It would be interesting to explore whether our method’s robustness to cascading differences in the KV-cache can also extend to stochastic gradient descent.
Extending to different modalities and pipelines: The current work only explores LLM inference on text. However, there are many other types of model inferencing people are interested in verifying. For example, we could extend the method to verify computation for multi-modal language models and stable diffusion text to image pipelines. As long as we have an activation tensor that is dependent on all previous activations and is theoretically deterministic (does not make any calls to a random number generator), we can apply TOPLOC to verify the computation.
Speculative decoding and sampling: Our method is currently not capable of detecting speculative decoding, which requires inspecting the execution of the sampling algorithm.
Unstable prompt mining: Inference consumers may attempt to exploit the system by mining for prompts that deliberately increase the likelihood of validation failure. Ensuring that the method is resistant to such attacks remains an important consideration for widespread use of TOPLOC for verifying inference computations.

Why TOPLOC Matters

Verifying model inference is a critical component in building trustworthy and decentralized AI protocols. By enabling inference verification, users can trust inference providers and audit model changes that may impact downstream systems. As AI-powered agents become a larger part of the software stack, the demand for verifiable on-chain inference will continue to grow.

To date, most efforts in this area have focused on generating Zero-Knowledge (ZK) proofs for model inference. However, we believe this approach is unlikely to achieve practical adoption in the near future. ZK proof generation and validation methods are currently too computationally expensive and slow. Even under optimistic assumptions about advancements in ZK methods, the cost of generating ZK proofs for inference would still be 100× higher than the original inference computation.

A more practical approach is to use recomputation to secure the system. Recomputation also integrates seamlessly with cutting-edge model inference engines, which would be a challenge for ZK solutions to integrate.

The use of recomputing intermediate activations to verify the integrity of trustless compute has been explored in works like arXiv:2405.00295 and arXiv:2406.00586. However, these methods require the computations to be deterministic, making them unsuitable for handling hardware non-determinism. Additionally, they cannot take advantage of algebraic shortcuts in validation, such as the ones we propose.

Advantages of TOPLOC

100% detection accuracy for unauthorized modifications in our experiments.
Robust across diverse GPU and tensor parallel configurations.
Faster validation speeds than generation (up to 100×).
Over 1000x memory efficiency compared to directly storing the last hidden layer values.

Conclusion

As LLMs continue to become more integral to the modern computing stack, ensuring the integrity and trustworthiness of inference providers becomes increasingly critical. TOPLOC represents a significant advancement in addressing this challenge by providing a practical, efficient, and robust solution for verifiable inference.

TOPLOC achieves this through several key innovations:

A novel locality sensitive hashing mechanism that can detect unauthorized modifications to models, prompts, or precision with perfect accuracy that is robust across different hardware configurations, GPU types, and computational reorderings.
A validation approach that leverages algebraic rewrite to achieve a validation speed of up to 100× the original generation.
A polynomial encoding scheme that reduces proof storage overhead by 1000×

Unlike existing approaches such as ZK proofs which incur prohibitive computational overhead, TOPLOC's introduces negligible overhead to the inference provider. This efficiency, combined with its seamless integration capabilities with modern inference engines, positions TOPLOC as a viable solution for widespread adoption in trustless and verifiable compute protocols.

TOPLOC enables the development of truly decentralized AI protocols where users can trust inference providers without requiring centralized authorities. This foundation of trust is crucial for the emerging ecosystem of AI agents and on-chain AI applications, where verifiable computation is not just desirable but essential.

Outlook

For our launch of INTELLECT-MATH last week, focusing on our first reasoning and synthetic data release, we provided an outlook on decentralized training within the inference-compute paradigm: The inherently parallel nature of synthetic data generation as well as reinforcement learning makes it particularly well-suited for a globally distributed setting. Workers can operate independently, each focusing on their own generation tasks under the supervision of a verifier, without requiring frequent direct communication between them, offering significant scalability advantages.

However, this decentralized setting presents a critical challenge: How can we ensure that workers are performing inference correctly? What if they secretly substitute the model with a smaller one to reduce costs? Or worse, what if they insert hidden tokens into the prompt to bias the generations?

With the release of TOPLOC, we are one step closer to addressing these trust challenges and are excited to leverage it in our early protocol experiments in the coming weeks.

‍

Eurus-2-PRIME

System Prompt

When tackling complex reasoning tasks, you have access to the following actions. Use them as needed to progress through your thought process. [ASSESS] [ADVANCE] [VERIFY] [SIMPLIFY] [SYNTHESIZE] [PIVOT] [OUTPUT] You should strictly follow the format below: [ACTION NAME] # Your action step 1 # Your action step 2 # Your action step 3 ... Next action: [NEXT ACTION NAME]

User

Quadratic polynomials $P(x)$ and $Q(x)$ have leading coefficients $2$ and $-2,$ respectively. The graphs of both polynomials pass through the two points $(16,54)$ and $(20,53).$ Find $P(0) + Q(0).$

Eurus-2-PRIME

[ASSESS]

We are given two quadratic polynomials $P(x)$ and $Q(x)$ with leading coefficients $2$ and $-2$, respectively. The graphs of both polynomials pass through the points $(16,54)$ and $(20,53)$. We need to find the value of $P(0) + Q(0)$.

[ADVANCE]

Let's express the quadratic polynomials $P(x)$ and $Q(x)$ in the general form:
$ P(x) = 2x^2 + bx + c $
$ Q(x) = -2x^2 + dx + e $

Since the graphs of both polynomials pass through the points $(16,54)$ and $(20,53)$, we can substitute these points into the equations of $P(x)$ and $Q(x)$ to form a system of equations.

For $P(x)$:
$ P(16) = 2(16)^2 + 16b + c = 54 $
$ 512 + 16b + c = 54 $
$ 16b + c = -458 \quad \text{(Equation 1)} $

$ P(20) = 2(20)^2 + 20b + c = 53 $
$ 800 + 20b + c = 53 $
$ 20b + c = -747 \quad \text{(Equation 2)} $

For $Q(x)$:
$ Q(16) = -2(16)^2 + 16d + e = 54 $
$ -512 + 16d + e = 54 $
$ 16d + e = 566 \quad \text{(Equation 3)} $

$ Q(20) = -2(20)^2 + 20d + e = 53 $
$ -800 + 20d + e = 53 $
$ 20d + e = 853 \quad \text{(Equation 4)} $

[ADVANCE]

Now, we can solve the system of equations for $P(x)$:
Subtract Equation 1 from Equation 2:
$ (20b + c) - (16b + c) = -747 - (-458) $
$ 4b = -289 $
$ b = -\frac{289}{4} $

Substitute $b = -\frac{289}{4}$ into Equation 1:
$ 16\left(-\frac{289}{4}\right) + c = -458 $
$ -1156 + c = -458 $
$ c = 698 $

So, for $P(x)$, we have:
$ P(x) = 2x^2 - \frac{289}{4}x + 698 $
$ P(0) = 698 $

Next, we can solve the system of equations for $Q(x)$:
Subtract Equation 3 from Equation 4:
$ (20d + e) - (16d + e) = 853 - 566 $
$ 4d = 287 $
$ d = \frac{287}{4} $

Substitute $d = \frac{287}{4}$ into Equation 3:
$ 16\left(\frac{287}{4}\right) + e = 566 $
$ 1148 + e = 566 $
$ e = -582 $

So, for $Q(x)$, we have:
$ Q(x) = -2x^2 + \frac{287}{4}x - 582 $
$ Q(0) = -582 $

[ADVANCE]

Finally, we can find $P(0) + Q(0)$:
$ P(0) + Q(0) = 698 + (-582) = 116 $

So, the value of $P(0) + Q(0)$ is $\boxed{116}$.

INTELLECT-MATH-7B-SFT (ours)

System Prompt

Solve the following math problem efficiently and clearly. Think carefully and step by step about your response and reason before providing a final response. Conclude your response with: \n\nTherefore, the final answer is: $\\boxed{answer}$. I hope it is correct.\n\nWhere [answer] is just the final number or expression that solves the problem.

User

Quadratic polynomials $P(x)$ and $Q(x)$ have leading coefficients $2$ and $-2,$ respectively. The graphs of both polynomials pass through the two points $(16,54)$ and $(20,53).$ Find $P(0) + Q(0).$

INTELLECT-MATH-7B-SFT

So I've got this problem here about quadratic polynomials $P(x)$ and $Q(x)$. $P(x)$ has a leading coefficient of $2$, and $Q(x)$ has a leading coefficient of $-2$. Both polynomials pass through the points $(16,54)$ and $(20,53)$. I need to find $P(0) + Q(0)$.

First, since both $P(x)$ and $Q(x)$ are quadratic polynomials, I can express them in the general form:

$P(x) = 2x^2 + bx + c$

$Q(x) = -2x^2 + dx + e$

Wait, actually, the leading coefficient is the coefficient of the highest power of $x$, which for a quadratic is $x^2$. So yes, $P(x) = 2x^2 + bx + c$ and $Q(x) = -2x^2 + dx + e$.

Now, both polynomials pass through the points $(16,54)$ and $(20,53)$. That means when $x=16$, $P(16)=54$, and when $x=20$, $P(20)=53$. Similarly for $Q(x)$.

So, I can set up equations using these points.

First, for $P(x)$:

$P(16) = 2(16)^2 + b16 + c = 54$

Similarly, $P(20) = 2(20)^2 + b20 + c = 53$

Let me calculate $16^2$ and $20^2$:

$16^2 = 256$

$20^2 = 400$

So,

$P(16): 2256 + 16b + c = 54 \Rightarrow 512 + 16b + c = 54 \Rightarrow 16b + c = 54 - 512 \Rightarrow 16b + c = -458$

$P(20): 2400 + 20b + c = 53 \Rightarrow 800 + 20b + c = 53 \Rightarrow 20b + c = 53 - 800 \Rightarrow 20b + c = -747$

Now, I have two equations:

1) $16b + c = -458$

2) $20b + c = -747$

I can subtract equation 1 from equation 2 to eliminate $c$:

$(20b + c) - (16b + c) = -747 - (-458)$

$20b + c - 16b - c = -747 + 458$

$4b = -289$

So, $b = -289 / 4 = -72.25$

Wait, but coefficients are usually integers, but maybe not. Let me check.

Now, plug $b$ back into one of the equations to find $c$. Let's use equation 1:

$16(-72.25) + c = -458$

$-1156 + c = -458$

$c = -458 + 1156 = 698$

So, $P(x) = 2x^2 - 72.25x + 698$

Hmm, $P(0)$ would be $c$, which is $698$.

Now, let's do the same for $Q(x)$.

$Q(16) = -2*(16)^2 + d16 + e = 54 \Rightarrow -2256 + 16d + e = 54 \Rightarrow -512 + 16d + e = 54 \Rightarrow 16d + e = 54 + 512 \Rightarrow 16d + e = 566$

$Q(20) = -2*(20)^2 + d20 + e = 53 \Rightarrow -2400 + 20d + e = 53 \Rightarrow -800 + 20d + e = 53 \Rightarrow 20d + e = 53 + 800 \Rightarrow 20d + e = 853$

Now, equations for $Q(x)$:

1) $16d + e = 566$

2) $20d + e = 853$

Subtract equation 1 from equation 2:

$(20d + e) - (16d + e) = 853 - 566$

$4d = 287$

$d = 287 / 4 = 71.75$

Now, plug $d$ back into equation 1:

$1671.75 + e = 566 \Rightarrow 1148 + e = 566 \Rightarrow e = 566 - 1148 = -582$

So, $Q(x) = -2x^2 + 71.75x - 582$

Therefore, $Q(0) = e = -582$

Now, $P(0) + Q(0) = 698 + (-582) = 116$

Wait, but the problem says "present the answer in LaTeX format: \boxed{Your answer}"

So, I should write $\boxed{116}$

But let me double-check my calculations to make sure I didn't make any mistakes.

First, for $P(x)$:

$P(16) = 2(256) + 16b + c = 512 + 16b + c = 54 \Rightarrow 16b + c = -458$

$P(20) = 2*(400) + 20b + c = 800 + 20b + c = 53 \Rightarrow 20b + c = -747$

Subtracting: $4b = -289 \Rightarrow b = -72.25$

Then, $c = -458 - 16*(-72.25) = -458 + 1156 = 698$

Similarly for $Q(x)$:

$Q(16) = -2*(256) + 16d + e = -512 + 16d + e = 54 \Rightarrow 16d + e = 566$

$Q(20) = -2*(400) + 20d + e = -800 + 20d + e = 53 \Rightarrow 20d + e = 853$

Subtracting: $4d = 287 \Rightarrow d = 71.75$

Then, $e = 566 - 16*71.75 = 566 - 1148 = -582$

So, $P(0) = 698$, $Q(0) = -582$, sum is $116$.

Seems correct.

Final Answer

$\boxed{116}$

INTELLECT-MATH-7B (ours)

System Prompt

User

Quadratic polynomials $P(x)$ and $Q(x)$ have leading coefficients $2$ and $-2,$ respectively. The graphs of both polynomials pass through the two points $(16,54)$ and $(20,53).$ Find $P(0) + Q(0).$

INTELLECT-MATH-7B

So I have two quadratic polynomials, $P(x)$ and $Q(x)$, with leading coefficients 2 and -2, respectively. Both pass through the points $(16,54)$ and $(20,53)$. I need to find $P(0) + Q(0)$.

First, since both are quadratic polynomials, I can express them in the general form:

$P(x) = 2x² + bx + c$

$Q(x) = -2x² + dx + e$

But since both pass through the points $(16,54)$ and $(20,53)$, I can set up equations using these points.

For $P(x)$:

$P(16) = 2*(16)² + b \cdot 16 + c = 54$

$P(20) = 2(20)² + b \cdot 20 + c = 53$

Similarly for $Q(x)$:

$Q(16) = -2(16)² + d \cdot 16 + e = 54$

$Q(20) = -2(20)² + d \cdot 20 + e = 53$

Let's compute the squares:

$16² = 256$

$20² = 400$

So for $P(x)$:

$2256 + 16b + c = 54 \Rightarrow 512 + 16b + c = 54 \Rightarrow 16b + c = 54 - 512 = -458 \quad ...(1)$

$2400 + 20b + c = 53 \Rightarrow 800 + 20b + c = 53 \Rightarrow 20b + c = 53 - 800 = -747 \quad ...(2)$

For $Q(x)$:

$-2256 + 16d + e = 54 \Rightarrow -512 + 16d + e = 54 \Rightarrow 16d + e = 54 + 512 = 566 \quad ...(3)$

$-2400 + 20d + e = 53 \Rightarrow -800 + 20d + e = 53 \Rightarrow 20d + e = 53 + 800 = 853 \quad ...(4)$

Now, I have two systems of linear equations:

For $P(x)$:

$16b + c = -458 \quad ...(1)$

$20b + c = -747 \quad ...(2)$

Subtract equation (1) from equation (2):

$(20b + c) - (16b + c) = -747 - (-458)$

$4b = -289 \Rightarrow b = -\frac{289}{4}$

Now plug $b$ back into equation (1):

$16\left(-\frac{289}{4}\right) + c = -458 \Rightarrow -1156 + c = -458 \Rightarrow c = -458 + 1156 = 698$

So $P(x) = 2x² - \frac{289}{4}x + 698$

For $Q(x)$:

$16d + e = 566 \quad ...(3)$

$20d + e = 853 \quad ...(4)$

Subtract equation (3) from equation (4):

$(20d + e) - (16d + e) = 853 - 566$

$4d = 287 \Rightarrow d = \frac{287}{4}$

Now plug $d$ back into equation (3):

$16\left(\frac{287}{4}\right) + e = 566 \Rightarrow 1148 + e = 566 \Rightarrow e = 566 -1148= -582$

So $Q(x)=-2x^{2}+\frac{287}{4}x-582$.

Now, $P(0)=c=698$.

$Q(0)=e=-582$.

Therefore, $P(0)+Q(0)=698+(-582)=116$.

Final Answer

$\boxed{116}$.

Authors

Jackmin

Johannes

Acknowledgements

SYNTHETIC-2: Planetary-Scale Pipeline Parallel Inference for Verified Reasoning

Today, we’re excited to launch SYNTHETIC-2, our next-generation, open-source reasoning dataset and planetary-scale, pipeline-parallel decentralized inference run.Built on our peer-to-peer inference stack and powered by the new DeepSeek-R1-0528 model, SYNTHETIC-2 generates verified reasoning traces spanning the most comprehensive set of complex reinforcement-learning tasks and verifiers released to date.

PCCL - Prime Collective Communications Library

We are excited to release the Prime Collective Communications Library, a low-level communication library built for decentralized training across the globe.

INTELLECT-2 Release: The First 32B Parameter Model Trained Through Globally Distributed Reinforcement Learning

We're excited to release INTELLECT-2, the first 32B parameter model trained via globally distributed reinforcement learning. Unlike traditional centralized training efforts, INTELLECT-2 trains a reasoning language model using fully asynchronous RL across a dynamic, heterogeneous swarm of permissionless compute contributors.

TOPLOC - A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference

The Problem: Trust in LLM Inference

TOPLOC

Robustness across Different GPUs, Attention and Tensor Parallel Implementations

Integration into Popular LLM Inference Engines

Challenges and Future Directions

Why TOPLOC Matters

Conclusion

Outlook

SYNTHETIC-2: Planetary-Scale Pipeline Parallel Inference for Verified Reasoning

PCCL - Prime Collective Communications Library

INTELLECT-2 Release: The First 32B Parameter Model Trained Through Globally Distributed Reinforcement Learning

Related Posts

SYNTHETIC-2: Planetary-Scale Pipeline Parallel Inference for Verified Reasoning

PCCL - Prime Collective Communications Library

INTELLECT-2 Release: The First 32B Parameter Model Trained Through Globally Distributed Reinforcement Learning