I was first introduced to information theory through puzzles regarding scales and fake coins. Consider the following:
You have 9 coins, but one of them is fake! They all look the same, but the fake coin is slightly heavier. The only tool you have at your disposal is a balance scale, which lets you weigh one set of coins against another. You can only use the scale twice. Can you determine which coin is fake?
I will not present the solution, but I will explain why you need to use the scale at least twice. Since any one of 9 coins could be fake, there are nine possible answers. You can only distinguish between 9 possibilities if you perform tests that have at least 9 possible results. Each time you use the scale, there are 3 possible results (tilt left, tilt right, or stay balanced). Therefore, you need to use the scale twice so that there are 3*3 = 9 possible results.
Understanding the underlying principle is key to creating more puzzles of this variety.
Put yourself in the shoes of someone writing a puzzle. You liked the one with 9 coins, and want to make a slight variation on it:
You have N coins, and two of them are fake. The fake coins are heavier than the real coins, but equal in weight to each other. You have a balance scale but you can only use it three times. Can you determine which coins are fake?
As the puzzle writer, you are tasked with the question, what should N be? You probably want N to be as large as possible. The number of possible arrangements is N*(N-1)/2. The number of possible results from two uses of the scale is 3*3*3 = 27. So the first thing to try is N = 7.
Of course, N = 7 is just an upper bound. It may turn out that N = 7 is not possible. It’s necessary to check for a solution. (It turns out N=7 is possible.)
Here’s another puzzle, now moving beyond coin weighing.
As a science project, you built landing gear for an egg. You want to test it how high the egg can fall without breaking. The trouble is you only have two eggs. Furthermore, you only have enough time for ten tests. If you want to measure the best height within a foot, up to what height can you test?
Looking at this from the point of view of information, we can ask, how many possible test results are there? We can perform ten tests, and each test has two possible results (the egg breaks, or it doesn’t). However, only up to two of the tests can break the egg, after which no more tests can be performed. So the number of possible results is 10*9/2 + 10 + 1 = 56. So at most, we could test up to 55 feet (and the 56th result would indicate that the egg can fall at least 56 feet).
Of course, when I was first introduced to these puzzles, I didn’t see the connection to information theory. I just knew that if you are to distinguish between N possibilities using testing, you’d need to have a test with at least N possible results. Now that I know a little better, I can make the connection more explicit.
Information theory was originally developed to model communication. Alice wants to send a message to Bob, and we want to know how long it will take to send the message given a particular communication channel. It turns out that the length of the message is proportional to the logarithm of possible messages that could be sent. Information is a way of measuring that message length. By convention, if we use a base-2 logarithm, the unit of information is called a “bit”.
In the case of the 9 coins, Alice is the universe, and Alice wants to tell us which of the coins are fake. Since there are 9 possible messages that could be sent, the amount of information in this message is log2(9) bits (or 3.17 bits). Alice’s only method of communication is through the results of testing. Each test has 3 possible results. Thus the amount of information that can be communicated is log2(3) bits (or about 1.58 bits) per test. It requires (at least) two tests for Alice to send us the information.
Information theory gets more complicated from there. A lot of theory is concerned with communication channels that have noise. What if every test has a 10% chance of giving an incorrect result? In that case, we’d need to do more work to define what even counts as a “solution” to the puzzle. I’ll write more on this subject in the future.