One's default when seeing a claim to do better than the Shannon limit is to gues...

AnotherGoodName · on June 24, 2024

It's part of learning. Over excited and some guidance would be helpful but it's learning. He's reinvented an octagonal wheel and we're now able to explain to him 'good start, see this round wheel (arithmetic encoding) over here? it's even better!'.

gliptic · on June 24, 2024

It's easy to see this octogonal wheel [rolling downhill] is faster than this round wheel [on a flat plain]!

peter-ebert · on June 24, 2024

Yeah this was a big fear in posting. No I'm not experienced in academia or this would be a paper, I code things for fun. Have you tried looking at the math for multinomials vs Shannon? Or running the code? Please do point out my mistakes in the math.

ossacip · on June 24, 2024

It is not our responsibility to do that. You should explain how your method differs from well-established methods such as arithmetic coding (see papers by Rissanen). While doing that, you should quickly realize that you’ve got nothing at all.

A basic data compression course would have sufficed too, but as you say, you don’t have any knowledge of the basics of information theory or data compression. You sound like an over-eager yet ignorant grad student who insists he has found flaws in the professor’s lecture material.

AnotherGoodName · on June 24, 2024

>As an example, given the symbol frequencies of 1, 2, and 3 for a total of 6 symbols. Shannon limit [-(1/6)log2(1/6)-(2/6)log2(2/6)-(3/6)log2(3/6)]6 = 8.75 => 9 bits

This isn't how you calculate Shannon limit fwiw. If all symbols are of equal frequency simply take the total number of different symbols and simply log2(different_symbol_count). That's it.

So say you had 60 different permutations. Each of those is a symbol in entropy encoding. Each individual symbol takes log2(60) bits to store using arithmetic encoding. Which is exactly the statement and correct calculation you had for calculating the Shannon limit in the very next line :). As in the Shannon limit for 60 different symbols is absolutely 5.9bits not 9bits

Instead you have some weird calculation here that seems to take individual probabilities and add them back up again to give 9bits. That's very far off and incorrect.

peter-ebert · on June 24, 2024

Thanks I'll correct, I didn't know you could adjust the symbols as you go with arithmetic coding, got an example of that?

foobarqux · on June 24, 2024

I think you could just have a different code for each given index and all the symbols seen until that point.

This seems to do something similar: https://en.wikipedia.org/wiki/Context-adaptive_binary_arithm...

peter-ebert · on June 24, 2024

fwiw I was using the same math they did here to get 4.755 bits: https://en.wikipedia.org/wiki/Arithmetic_coding#Sources_of_i... [-(1/3)log2(1/3)-(1/3)log2(1/3)-(1/3)log2(1/3)]*3=4.7548

foobarqux · on June 24, 2024

I think you are confused about probabilities and realized frequencies. A probability is a before-the-fact model. A realized frequency is an after the fact observation. If you flip a fair coin the probability of heads or tails is 50% but any given single flip will have a frequency of 1 for either heads or tails and 0 for the other. (What is true is that as you flip the coin a very large number of times the actual ratio you will see will approach 50/50 but that's definitely not true for say 8 tosses. This is called the asymptotic equipartition principle but it isn't really relevant here).