Score:3

Reference implementation of Shamir's Secret Sharing

mk flag

Is there an implementation of Shamir's Secret Sharing that can be regarded as a "canonical" (or "reference" or "standard") implementation, so that I can test other implementations to be "standard compliant"?

The above question is pretty vague. I have more details in mind, but some of them might be misleading or based on false assumptions. So possibly not all of them can be fulfilled or are relevant.

  • I'm looking for a pure implementation of Shamir's Secret Sharing - pure means without additional (security) features.
  • The input should be an array of bytes as secret, a threshold t and a number of shares n.
  • The output should be n shares, where each share consists of the share number and an array of bytes of the same size as the secret.
  • The algorithm should use GF(256). This is based on the following assumptions:
    • When a field other than GF(256) is used for sharing, there is no guarantee that the secret can be reconstructed using GF(256).
    • Requiring the use of GF(256) is enough to ensure that each correct split implementation will be compatible with any other correct join implementation. If this assumption is not complete - what is missing for a full specification of the method?

The motivation for this question is: I noticed that when I share a secret with implementation A, it is not sure I can reconstruct the secret with implementation B.

For example, "hello" shared with the implementation https://github.com/codahale/shamir has given me the shares

1-081dea6049
2-c869462a01
3-a811c02627
4-a8a0cc833b
5-c8d84a8f1d

// Implemented like:
Scheme scheme = new Scheme(new SecureRandom(), 5, 3);
Map<Integer, byte[]> split = scheme.split("hello".getBytes("UTF-8"));

Reconstructing the secret from shares 5, 2, 3 using https://github.com/codahale/shamir works fine, like this:

Scheme scheme = new Scheme(new SecureRandom(), 5, 3);
Map<Integer, byte[]> example = Map.of(
        5, java.util.HexFormat.of().parseHex("c8d84a8f1d"),
        2, java.util.HexFormat.of().parseHex("c869462a01"),
        3, java.util.HexFormat.of().parseHex("a811c02627")
);
byte[] exampleJoined = scheme.join(example);

But reconstructing the secret from the same shares using the debian package "ssss" (http://point-at-infinity.org/ssss/, version v0.5, January 2006) gives me the byte array 056bcedfa2 (where I would have expected the bytes of "hello", i.e. 68656c6c6f):

> ssss-combine -t 3 -x -D
Enter 3 shares separated by newlines:
Share [1/3]: 5-c8d84a8f1d
Share [2/3]: 2-c869462a01
Share [3/3]: 3-a811c02627
Resulting secret: 056bcedfa2
Score:6
us flag

Alas, there cannot be a standard or reference implementation of Shamir's secret-sharing scheme even if we restrict the field to be GF$(256)$. Different implementations of GF$(256)$ will use different irreducible binary polynomials (of degree 8) to define the field, and even if one chooses one polynomial and say the field implemented using that one is the standard one and all other implementations are Brand X and not to be used, how is one to enforce this view on the rest of the world? There is also a big-endian versus little-endian view as explained further in the rest of this paragraph. In most implementations of GF$(256)$, field elements are represented as binary polynomials of degree $7$ or less and stored as single bytes. But, does 83H = 10000011 mean the polynomial $x^7 + x +1$ or does it mean the polynomial $1 + x^6 + x^7$? The choice affects the multiplication tables/subroutine/hardware design.

Recall that a $(n,t)$ Shamir's secret-sharing scheme creates $n$ shares by evaluating a polynomial $f(x)$ of degree $t-1$ at $n$ distinct nonzero elements $\alpha_1, \alpha_2, \cdots, \alpha_n$ of the field. Those, there are two parts to the $i$-th share of the secret in Shamir's secret-sharing scheme: $f(\alpha_i)$, the value of the share and $\alpha_i$, the location of the share (i.e. the field element at which $f(x)$ was evaluated. Since different designers might well choose different sets of the $\alpha_i$'s, there is no guarantee that shares created by A's implementations can be used to recreate the secret using B's implementation.

This is especially so if what each shareholder has available is only $f(\alpha_i)$ and the "dealer" is the only one who knows the $\alpha_i$'s (and of course the identities of the shareholders submitting their shares so that the dealer can associate $\alpha_i$ with submitted share $f(\alpha_i)$; an added layer of security to prevent a cabal of $t$ shareholders from recreating the secret and going to town with it, while the rest of the shareholders are blissfully unaware that the horse had left the barn.

Some amelioration of this problem can be achieved if the "share" held by each user is of the form $(f(\alpha_i), \alpha_i)$ and the secret-reconstruction algorithm (see e.g. this answer of mine) uses the share locations and the share values submitted, but then $t$ disgruntled shareholders can reconstruct the secret without the aid of any dealer, etc as long as the reconstruction is using the same implementation of GF$(256)$ that was used to create the shares. A secret reconstruction using a different implementation of GF$(256)$ will again give the same invalid results that the OP was complaining about.

us flag
@MaartenBodewes Thanks for editing the wall of text that I wrote to make my answer more readable.
Georg avatar
mk flag
I don't see that byte order comes into play here, since we are looking at each byte of a byte array individually. The paragraph "Some amelioration..." points in the direction of the reference implementation I'm looking for: Given that an algorithmically correct implementation shares (f(αi),αi), uses a standardized GF(256) implementation (see also @Mark 's answer) and accepts that t shareholders can reconstruct the secret without the aid of any dealer, it could be used as the reference implementation I asked for, right?
us flag
@Georg I have clarified what I mean by the big-end versus little-end issue in my answer.
Score:3
ge flag

Precisely because there is no single way of doing it there is an attempt to standarize Shamir's secret sharing with SLIP-0039. You can find a reference Python implementation at Github here. It is oriented towards recovering mnemonic phrases but I expect it will respond to your needs.

Score:3
ng flag

While Dilip's answer states this, it is worth explicitly stating that $GF(256)$ is quite far from being uniquely defined. So your statement

Requiring the use of GF(256) is enough to ensure that each correct split implementation will be compatible with each other correct join implementation. If this assumption is not complete - what is missing for a full specification of the method?

Is false. What is needed is to pick

  1. a specific field (of the many possible choices, namely most monic irreducible polynomials should work, though there might be some nuance with ensuring the Galois group of the field is correct - I can't remember the precise definition right now)

  2. a specific basis for that field as a vector space over $\mathbb{F}_2$.

More generally, you need everyone to agree on some serialization/deserialization technique for whatever field you agree on. Specifying a vector-space basis is one way of doing this.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.