Linearly independent subsets

Created at 2023-08-13

The binary words 001001, 010010, 100100 and 111111 have an interesting property: Whenever we choose three of them, the are linearly independent in the F2\mathbb{F}_2-vector space F23\mathbb{F}_2^3 of binary words of length nn.

I'd like to turn this observation into a more general question: How large can a set SS of binary words in F2n\mathbb{F}_2^n be such that any subset with at most kk elements is linearly independent? (We will call such a set linearly kk-independent.)

This problem was proposed a few years ago by a fellow student and I made progress when I applied the techniques from the book Making Up Your Own Mind.

Linearly 2-independent sets

This is easy. Let SF2n{0}S \coloneqq \mathbb{F}_2^n \setminus\{0\}. This is a 22-independent set, and it is clearly the largest one!

Linearly 3-independent sets

For n=3n = 3, a simple argument suffices to see that a linearly 33-independent set SS cannot have more than four elements. Assume that SS contains the three pairwise distinct elements u,v,wu, v, w. If xSx \in S is another distinct element, then xx can uniquely be written as linear combination λ1u+λ2v+λ3w\lambda_1 u + \lambda_2 v + \lambda_3 w with coefficients λiF2\lambda_i \in \mathbb{F}_2. Because u,v,xu, v, x are linearly independent, the coefficient λ3\lambda_3 must not be zero. A similar observation holds for λ1\lambda_1 and λ2\lambda_2. Therefore, λ1=λ2=λ3=1\lambda_1 = \lambda_2 = \lambda_3 = 1 and x=u+v+wx = u + v + w. This shows that SS cannot contain more elements than u,v,w,xu, v, w, x.

How about larger nn? The words in F2n\mathbb{F}_2^n that begin with a 11 form a linearly 33-independent set. Indeed, the sum of three such words again begins with a 11 -- and is therefore nonzero. The linearly 33-independent set we just found contains 2n12^{n-1} elements, exactly half of all words of length nn. There is a deeper reason why this works: The words that begin with a 00 form a subgroup of F2n\mathbb{F}_2^n of index two. Whenever we have a subgroup GG of index two, its complement forms a linearly 33-independent set. Another subgroup I could have used consists of words with an even number of 11's.

So we found linearly 33-independent sets with 2n12^{n-1} elements. Can we find even larger sets? As it turns out, no: Let SS be any linearly 33-independent set and wSw \in S. Then w+Sw + S is disjoint to SS. Because both sets are contained in F2n\mathbb{F}_2^n, we obtain 2S2n2\left|S\right| \le 2^n. In other words, S2n1\left|S\right| \le 2^{n-1}.

A surprising connection

It is not clear how to generalize this argument to n>3n > 3, so let us reframe the problem a bit. If we write all \ell vectors in a linearly kk-independent SS as columns into a matrix AF2n×A \in \mathbb{F}_2^{n\times \ell}, then kerAF2\ker A \subseteq \mathbb{F}_2^\ell is a vector space and all of its nonzero elements have at least k+1k + 1 non-zero components. Setting mdimkerAm \coloneqq \dim\ker A and choosing an isomorphism F2mkerA\mathbb{F}_2^m \cong \ker A, we obtain a linear block code (see Lineare Blockcodierung) with block length \ell, message length mlnm \ge l - n, overhead (block length minus message length) at most nn and minimum Hamming distance at least k+1k+1.

On the other hand, if we are given a linear block code of block length \ell, overhead at most nn and minimum Hamming distance k+1k+1, we can construct a linearly kk-independent set SF2nS \subseteq \mathbb{F}_2^n: The space VF2V \subseteq \mathbb{F}_2^\ell of code words has dimension n\ell - n. If we choose any (rank-nn) matrix AF2n×A \subseteq \mathbb{F}_2^{n\times\ell} with kerA=V\ker A = V, then the \ell columns of AA form a set SS as desired.

There are some immediate implications. Hamming codes give rise to the linearly 22-independent set F2n{0}\mathbb{F}_2^n\setminus\{0\} from above. SECDED codes, i.e. Hamming codes with an additional parity bit, are linear codes with block length 2n12^{n-1}, overhead nn and minimum Hamming distance 44. This confirms our earlier result, this time however as a corollary of an established fact in computer science. Conversely, Richard Hamming might have discovered Hamming codes (which efficiently realize a minimum Hamming distance of 33) quite naturally by constructing them from the maximal 22-linearly independent set SS from further above (and derive without further calculation that its block size is F2n{0}=2n1\left|\mathbb{F}_2^n\setminus\{0\}\right| = 2^n - 1 and that it has nn bits of overhead).

Linearly 4-independent sets

How about linearly 4-independent sets? There are so-called BCH codes with block length at least 2n/22^{n/2}, overhead nn and minimal Hamming distance 55. Therefore, there are linearly 44-independent sets with at least 2n/22^{n/2} elements in F2n\mathbb{F}_2^{n}. (I quickly derived these numbers from the introduction of this paper by Chen, but I did not do a deep dive into this topic). These codes require a decent amount algebra to derive them, and I find it fascinating that the theory behind these codes can help to find linearly 44-independent sets.

Can we do better? Probably a bit -- the paper I cite above itself improves on BCH codes, so there seems to be a bit of room for improvement. However, the order of magnitude is as good as it gets. We use an argument similar to the one we used to prove optimality of our linearly 33-independent sets.

Let SS be any linearly 44-independent set with \ell elements. Then there are i=12(i)=1++(1)2\sum_{i=1}^2\begin{pmatrix}\ell \\ i\end{pmatrix} = 1 + \ell + \frac{\ell(\ell-1)}{2} ways to sum up at most two elements of SS. Because SS is linearly 44-independent, these sums are pairwise distinct elements of F2n\mathbb{F}_2^n, and it follows that 122+12+12n\frac{1}{2}\ell^2 + \frac{1}{2}\ell + 1 \le 2^n. In particular, 2(n+1)/2\ell \le 2^{(n+1)/2}, and this bound is quite close to the size of the set we constructed with BCH codes.

(The binomial coefficient formula above is conspicuously similar to the so-called Hamming bound, which produces a similar bound for block codes with minimum Hamming distance 55.)