Cheonkam's Deep Learning Space: [Book Summary - Information Theory] Information Theory: A Tutorial Introduction, Ch 5 by James V. Stone

Thursday, December 29, 2022

[Book Summary - Information Theory] Information Theory: A Tutorial Introduction, Ch 5 by James V. Stone

1. Entropy: Average surprisal

2. Discrete variables

Continuous variables: divide the continuum into bins
Depending the number of bins, entropy changes

If the number of bins increases, thus each bin size being smaller, entropy increases due to more options being made.

Differential entropy
Transforming continuous variables

Changing the range of a discrete set of variables doesn't change the entropy

Example: In the case of a binary dice, which features only 0 and 1, the output should be 0 or 1 even though either of them is divided by infinite numbers between 0 and 1

Changing the range of a continuous variable does, because the entropy is based on bin-width

Doubling the range -> doubles the number of bins -> adds one bit, even if half of the bins aren't used

What about adding a constant?

As the term "constant" implies, It doesn't change entropy because range (variance) remains the same.

Maximum entropy distributions

What distribution can we engineer so that entropy is highest?

Fixed upper/lower bounds
Fixed mean, with all values >= 0 ==> exponential
Fixed variance (e.g., power)

Back to differential entropy

Infinitely accurately…
What in practice limits 'bin sizes'?

Noise!
If the amount of noise increase, it is getting harder to know what the actual signal was

How does noise limit bin-sized connection to transmission of information in language?

More noise, less precision due to the number of bins being smaller because of noise (i.e., log1/delta x decreases)
Zipf's law: Infrequent words => longer, lexicon is limited

Bin size => Amount of signal!

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)