Tuesday, December 20, 2022

[Book Summary - Information Theory] Information Theory: A Tutorial Intorudction, Chs 1-2 by James V. Stone

1. Useful information 

    • (maximize) informative redundancy 

    • (minimize) non-informative redundancy 


2. Noise
 

3. Minimizing "effort" required for transmitting each portion of an informative signal: selection in biology/language/culture 
  • What is "effort"?  
  1. i) articulatory? how much effort? (e.g., Spanish vs. English vowels) 

    i) cognitive effort? 

    Iii) motor plan for producing similar sounds, 

    v) sounds including some noise 


4. Bits and binary digits 

  • Bits: splitting the possible (meaning) space in half / quantity of the info 
  • Binary digits: each place in a binary number (a number consisting of binary digits) is 1 bit. 
  • Bytes: 8? 

m outcomes = 2^n binary choices 

n binary choices = log2(m outcomes) 

n bits = log2(m outcomes) 

 

 

5. Random variable: e.g., X(head) = 1, X(tall) = 0 / a kind of function  

6. Rather than an actual variable ? 

7. Schema for message transmission: a message is an ordered sequence of symbols corresponding to outcome values of random variables 

  • Data: meaning… social… discourse… / linear symbols…? /  
  • Encoder: codewords, taken from codebook (cf. symbols vs. codewords: might be same, but not necessarily, for example, freq.words with min.symbols & infreq.words with max.symbols 
  • Input x: can be identical to message, or transformed by some mapping (e.g., to compress removing redundancy) 
  • Channel: noise…? (can obscure the speech - brain filters the noise…?) need to know the differences btw noise and redundancy 
    • With regard to top-down processing: redundancy is useful in that it confirms… (e.g., los angeles) 
  • Output y: lossless, lossy 
  • Channel capacity: information per codeword; information rate = information per sec / close to the channel capacity…? 
    • Fewer contrast: by position to position fast but needs more time  
    • More contrast: by position to position slowly but needs less time 


  • Noise reduces channel capacity

8. Surprisal and entropy 

  • Surprisal = log(1/p(x)), or - log(p(x)) 
  • If we use log2, surprisal is measured in bits. 

9. Entropy 

  • Average surprisal of elements in a system: H(X) 
  • How do we estimate this in practice? 
10. Entropy of a system

11. Some assumptions
  • Independent and Identically Distributed (a.k.a. i.i.d.)
  • Stationary: doesn't change through time
  • Ergodic: system can be represented by a reasonably large number of measurements 
12. Uniform Information Density Hypothesis: if language is optimized for information transmission, it should tend toward equal surprisal for symbols per unit time 

Discussion: The actual entropy of written English letters lower than that of English estimated from the individual prob of letters from a big sample of text? Cuz that is not independent! More skewed because of the context! 

No comments:

Post a Comment