Walsh-Hadamard transform in randomness testing

Kevin Perez

3/14/24, 1:54 AM

I am working on using the Hadamard transform as a way to map randomly generated values and then apply statistical tests as defined by Nist or other institutions. One resource online I found particularly helpful, yet I do not seem to have the mathematical intuition to understand some parts. The python code and the text are found on quant at risk.

2D matrix of $x_{\text {seq }}$ holding our signal under investigation is the starting point to its tests for randomness. Opo9's test is based on the computation of WHT for each row of $x_{\text {seq }}$ and the t-statistics, $t_{i j}$, as a test function based on Walsh-Hadamard transformation of all sub-sequencies of $x(t)$. It is assumed that for any signal $y\left(t_i\right)$ where $i=0,1, \ldots$ the WHT returns a sequence ${w_i}$ and: (a) for $w_0$ the mean value is $m_0=2^m(1-2 p)$; the variance is given by $\sigma_0^2=2^{m+2} p(1-p)$ and the distribution of $\left(w_0-m_0\right) / \sigma_0 \sim N(0,1)$ for $m>7 ;$ (b) for $w_i(i \geq 1)$ the mean value is $m_i=0$; the variance is $\sigma_i^2=2^{m+2} p(1-p)$ and the distribution of $\left(w_i-m_i\right) / \sigma_i \sim N(0,1)$ for $m>7$. Recalling that $p$ stands for probability of occurrence of the digit 1 in $x_{\text {seq }, j}$ for $p=0.5$ (our desired test probability) the mean value of $w_i$ is equal o for every $i$.

In $x_{\text {seq }}$ array for every $ j=0,1, \ldots,(a-1)$ and for every $i=0,1, \ldots,(b-1)$ we compute $\mathrm{t}$ statistic as follows: $$ t_{i j}=\frac{w_{i j}-m_i}{\sigma_i} $$

Can someone explain to me how this distribution was conceived? I mean where do these parameters even come from? The Test statistic method makes sense it is just the formula for the parameters of the density function does not make sense to me.

Secondly, in the code when defining WHT, the dot product is calculated and then the result is divided by $2^m$. Why exactly is this the case? Is this a property of the WHT transform itself?

def WHT(x):
    # Function computes (slow) Discrete Walsh-Hadamard Transform
    # for any 1D real-valued signal
    # (c) 2015 QuantAtRisk.com, by Pawel Lachowicz
    x = np.array(x)
    if (len(x.shape) < 2):  # make sure x is 1D array
        if (len(x) > 3):  # accept x of min length of 4 elements (M=2)
            # check length of signal, adjust to 2**m
            n = len(x)
            M = math.trunc(math.log(n, 2))
            x = x[0:2 ** M]
            h2 = np.array([[1, 1], [1, -1]])
            for i in range(M - 1):
                if (i == 0):
                    H = np.kron(h2, h2)
                else:
                    H = np.kron(H, h2)

            return (np.dot(H, x) / 2. ** M, x, M)

249

1 + 1

statistical-test

randomness

kodlu

3/14/24, 2:02 AM

Please focus your question! That's a long document you linked to. Are you asking about using WHT for $\{\pm 1\}$ valued sequences or arbitrary financial time series?

Score:3

Crypto

kodlu

3/14/24, 2:15 AM

You need to read about WHT. It is just a fourier transform obtained for $\pm 1$ valued sequences. A fourier transform is usually normalized in different ways. Some authors like to divide by $2^n$ (since the sum is that of a function taking $2^n$ inputs) in the forward direction, others split the normalization into $2^{n/2}$ in both forward and reverse directions, to make the map an isometry.

References:

Claude Carlet's Chapter on Boolean functions is detailed, you can start there and read Section 2.
There is a nice book by Cusick and Stanica see here.

Consider a function $f:\{0,1\}^n \rightarrow \{0,1\}$ a binary boolean function. Sometimes it is convenient to work with $(-1)^{f(x_1,\ldots,x_n)}$ instead. There are $2^n$ points in its domain, hence the normalization factor.

The Walsh transform is usually written as $$\hat{f}(a)=\sum_{x\in{F^n_2}}(-1)^{f(x)+\langle a,x\rangle}.$$

The function $L_a(x)=\langle a,x\rangle=a_1 x_1+\cdots+a_n x_n$ is a linear multivariate function of $(x_1,\ldots,x_n)$.

The function $$f(x)+\langle a,x\rangle=f(x_1,\ldots,x_n)+a_1 x_1+\cdots+a_n x_n$$ equals $0$ mod 2 if $f(x)=\langle a,x\rangle$ and $1$ mod 2 otherwise.

The sum $$ \hat{f}(a)=\sum_{(x_1,\ldots,x_n) \in \mathbb{F}_2^n} (-1)^{f(x)+\langle a,x\rangle}, $$ is equal to $2^n-2 d_H(f,L_a)$ and computes the correlation between the function $f$ and the linear function $L_a$. Here $d_H(f,L_a)$ is the Hamming distance between those two functions as $x$ varies over $\mathbb{F}_2^n.$ Each one of these $2^n$ correlations (for different $a$) can be obtained by a matrix multiplication.

Since the mapping is invertible, a $2^n\times 2^n$ Hadamard matrix appears as below: $$\left[(-1)^{\langle a,x \rangle}\right]$$ where $a$ indexes rows and $x$ indexes the columns in the standard order.

Due to the kronecker structure of the Hadamard matrix, a fast transform and inverse transform exist. The functions that have the maximum hadamard coefficient $\hat{f}(a)$ minimized are called bent functions and are important in coding theory and cryptography. By Parseval's relation, $$\sum_{a \in \mathbb{F}_2^n} |\hat{f}(a)|^2 =2^{2n},$$ and bent functions have $|\hat{f}(a)|=2^{n/2}.$ Note $f(0)=0$ means a function is balanced, bent functions are not.

Under the uniform distribution on the domain $\{0,1\}^n$ the Walsh coefficients can be shown to be binomially distributed, and with proper normalization (like in the other question you asked) will behave like standard normal random variables.

+ 2

Kevin Perez

3/14/24, 2:41 AM

I was reading your answer for the past 10 minutes and it seems as though your knowledge of this subject is rather advanced. And while you are providing a detailed answer, I think it would be better if you linked me some literature that I can read to first understand some of the notation and theories that you referenced. Then I can come back to this answer just like last time with a better understanding. Is there any particular literature that covers all the works referenced, and given that it is introductory meaning that it introduces the topics and notation rather than using it immediately?

Kevin Perez

3/14/24, 3:00 AM

I salute you for your modifications and I have given you a checkmark as a valid answer particularly since it would be better recorded for posterity. I will come back later and detail to you what I have gathered from your answer, thank you.

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Walsh-Hadamard transform in randomness testing

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.