What is the methodology for selecting symbol bit length and window size when performing Shannon Entropy Analysis?

Liam Kelly

8/19/23, 2:24 PM

When performing Shannon Entropy analysis on something like an RNG or a file, you must:

Select a symbol bit length and number of samples would will perform analysis on at a time (IE: windows size)
Read the input till the window is full
Perform a histogram on the collected symbols
Take the histogram output and calculate Shannon Entropy
Repeat step 2 by either getting entirely new samples or sliding the window (IE: keep a portion of the already used samples)

Tools like binwalk do this automatically under the hood and do a pretty good job at showing unusual portions of files; however, it is not entirely clear how they:

Select symbol bit length
Select the windows size
If any window sliding is performed

Is there a methodology to selecting these values in the context of RNG and file analysis?

0 + 0

random-number-generator

entropy

algorithm-design

randomness

information-theory

Score:1

Crypto

Paul Uszak

8/19/23, 5:26 PM

Liam, what you're asking is still an open question. There is no standardised methodology for calculating the entropy of a file in the general case. Even NIST have said so with their non IID 800-90B calculations. The following questions are rhetorical to illustrate the problem:-

What is the symbol bit length? Who knows. Shakespeare's works have line, act and paragraph demarcations. Are they included within your window? And they use weird words that could be represented by Huffman codes.
What do you histogram? Really, what exactly would you histogram?
How are the previous findings weighted?

The problem is not the window. It's the manipulation and weighting of said window that's the problem.

See https://en.wikipedia.org/wiki/Kolmogorov_complexity, http://www.reallyreallyrandom.com/photonic/technical/90b_latest/ and http://www.reallyreallyrandom.com/photonic/technical/algorithms/ and follow the links.

In short, there is no such thing as Shannon Entropy analysis in the general case :-(

0 + 0

Liam Kelly

8/19/23, 6:10 PM

Well it is at least comforting I am not missing something obvious.

Paul Uszak

8/19/23, 8:03 PM

@LiamKelly God no. You're pushing boundaries of how we calculate entropy of general things. If you followed the links, you'll realise that it's quite tricky. The Shannon do-da formula only works or identically and independent sources.

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: What is the methodology for selecting symbol bit length and window size when performing Shannon Entropy Analysis?

TH: วิธีการในการเลือกความยาวบิตของสัญลักษณ์และขนาดหน้าต่างเมื่อทำการวิเคราะห์เอนโทรปีของแชนนอนคืออะไร

RO: Care este metodologia de selectare a lungimii biților de simbol și a mărimii ferestrei atunci când se efectuează Analiza Entropiei Shannon?

RU: Какова методология выбора битовой длины символа и размера окна при выполнении энтропийного анализа Шеннона?

VI: Phương pháp để chọn độ dài bit ký hiệu và kích thước cửa sổ khi thực hiện Phân tích Entropy Shannon là gì?

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.