Encoding
Overall Requirements
Within 0G DA, blobs are encoded so that they can be scalably distributed among the Storage nodes. The 0G DA encoding module is designed to meet the following security requirements:
Adversarial tolerance for DA nodes: We need to have tolerance to arbitrary adversarial behavior by DA nodes up to some threshold, which is discussed in other sections. Note that simple sharding approaches such as duplicating slices of the blob data have good tolerance to random node dropout, but poor tolerance to worst-case adversarial behavior.
Adversarial tolerance for disperser: We do not want to put trust assumptions on the encoder or rely on fraud proofs to detect if an encoding is done incorrectly.
Interfaces
From a system standpoint, the encoder module must satisfy the following interface, which codifies requirements 1 and 2 above.
Notice that these interfaces only support a global chunk size across all the encoded chunks for a given encoded blob. This constraint derives mostly from the design of the KZG FFT Encoder Backend which generates multireveal proofs in an amortized fashion.
Trustless Encoding via KZG and Reed-Solomon
0G DA uses a combination of Reed-Solomon (RS) erasure coding and KZG polynomial commitments to perform trustless encoding. In this section, we provide a high level overview of how the 0G DA encoding module works and how it achieves these properties.
Basic Reed Solomon Encoding
Basic RS encoding is used to achieve the first requirement of tolerance to adversarial node behavior. This looks like the following:
The blob data is represented as a string of symbols, where each symbol is elements in a certain finite field. The number of symbols is called the
BlobLength
These symbols are interpreted as the coefficients of a
BlobLength
-1 degree polynomial.This polynomial is evaluated at
NumChunks
*ChunkLength
distinct indices.Chunks are constructed, where each chunk consists of the polynomial evaluations at
ChunkLength
distinct indices.
Notice that given any number of chunks such that M*hunkLength
> BlobLength
, via polynomial interpolation it is possible to reconstruct the original polynomial, and therefore its coefficients which represent the original blob. Thus, this basic RS encoding scheme satisfies the requirement of the Encoder.Encode
interface.
For example, if we have a matrix B
of size , where , with being the original data size and being check code size. Also any rows submatrix can be inversed. Now we have the data vector D, . Now suppose, at most codes went wrong, so we have . We know is inversible, so , so we reconstruct the original .
Validation via KZG
Without modification, RS encoding has the following important problem: Suppose that a user asks an untrusted disperser to encode data and send it to the nodes. Even if the user is satisfied that each node has received some chunk of data, there is no way to know how the disperser went about constructing those chunks. KZG polynomial commitments provide a solution to this problem.
Encoded Chunk Verification
KZG commitments provide three important primitives, for a polynomial :
commit(p(X))
returns aCommitment
which is used to identify the polynomial.prove(p(X),indices)
returns aProof
which can be used to verify that a set of evaluations lies on the polynomial.verify(Commitment,Proof,evals,indices)
returns abool
indicating whether the committed polynomial evaluates toevals
and the providedindices
.
Blob Size Verification
KZG commitments also can be used to verify the degree of the original polynomial, which in turn corresponds to the size of the encoded blob.
The KZG commitment relies on a structured random string (SRS) containing a generator point multiplied by all of the powers of some secret field element , up to some maximum power . This means that it is not possible to use this SRS to commit to a polynomial of degree greater than . A consequence of this is that if is a polynomial of degree greater than , it will not be possible to commit to the polynomial .
The scheme thus works as follows: If the disperser wishes to claim that the polynomial is of degree less than or equal to , they must provide along with the commitment to , a commitment to . A verifier can request the disperser to open both polynomials at a random point , verify the values of and , and then check that . If these checks pass, the verifier knows that 1) , 2) the disperser was able to make a commitment to , and so , and therefore 3), . In practice, this protocol can be made non-interactive using the Fiat-Shamir heuristic.
Note: The blob length verification here allows for the blob length to be upper-bounded; it cannot be used to prove the exact blob length.
Validation Actions
When a DA node receives a DisperseBlob
request, it performs the following validation actions relative to each blob header:
The batcher will monitor the update of the metadata store (can be aws dynamodb) which is updated by the disperser for each new blob.
The batcher will create a batch by combining multiple blobs and create a batch header from the encoded data (which is done by an independent encoding service).
The batcher then creates merkle proof for each blob in the batch.
With all these information, the batcher sends the request to the on-chain contract for data indexing and verification.
Last updated