In my book Cultural Entropy, I devote some time to information theory, for the concept of entropy is impossible to explain without it. Likewise, attempting an explanation of cultural information, particularly the language subset of it, without entropy, is impossible. In reading various sources about information and language, I am struck by how excellent and simple the older texts are and how confusing or negligent are the newer texts. Language Files, which is a standard text for introductory linguistics courses, shows nothing, though it does discuss pragmatics.

But before the field was called pragmatics, and when linguistics had a little more perspective, the most common linguistics textbook was An Introduction to Descriptive Linguistics by H.A. Gleason (1955, 1961). This latter book, in particular, also forms an excellent foundation for a linguistics novice introduced in Field Linguistics, which I often analogize to amphibious warfare: the process of starting with zero firepower ashore and proceeding to dominance of the field. Field Linguistics as a practice is quite similar. A linguist arrives to a place s/he has never been, perhaps a village in remote Papua New Guinea, beginning with close to zero knowledge of the language and necessarily proceeding to learn everything, discerning a grammar, phonetic inventory, and all manner of other information. It is, in other words, a supremely practical art. Just so, Gleason’s textbook.

For the purposes of my discussion here, Descriptive Linguistics rises to the occasion as well. We begin with definitions:

The amount of information increases as the number of alternatives increases. […] Information is measured in units called… bits.  By definition, a code with two alternative signals, both equally likely, has a capacity of one bit per use. A code with four alternatives is defined as having a capacity of two bits per use…. […] The amount of information in any signal is the logarithm to the base two of the reciprocal of the probability of that signal.

This about sums up the useful parts for any schema of quantifying meaning that we might wish to undertake 50 years after the text was written. Focus on the point about alternatives. In a world with two machines communicating to each other, but only ever saying 1 or 0 back to each other and only once before responding, then the machines have only has two choices and they are both equally likely. The capacity is one bit. The machine might send its transmission in the following form: [0] or [1]. A code with four alternatives between the machines might look something like this: [0 0], [0 1], [1 0], or [1 1]. In fact, these would be all four of the alternatives and it’s a capacity of two bits being used.

Most human communication doesn’t look like this at all. True, we do often communicate in ways that necessitate or at least allow for either/or answers that might look like [0] or [1]. But most human utterances and writing look more like what you’re reading in terms of expressing ideas, narratives, and concepts, not just yes/no or either/or responses. An example of something slightly more complicated would be the set of alternatives to the question: which U.S. President from 1980 – 2011 has been the best? You have six choices: Carter, Reagan, Bush 41, Clinton, Bush 43, and Obama. The response, therefore, could be encoded as simply as [0], [1], [2], [3], [4], or [5] depending only on which number referred to which President. Another step up in complexity would be the set of alternatives to the question: which color is the best? As a technical matter, given the number of frequencies visible to the human eye, the answer is theoretically unlimited. There is, however, a practical limit: language. Every language only has so many recognized color words at any given moment. Some have as few as two, it is believed, while others have somewhere between 3 and 11, and a good many others have considerably more. English certainly falls into the last category and every 64 or 128 pack of crayons you see in the store proves it. There are many alternatives to choose from here.

Something that has been avoided by many linguists and information theorists until recently has been quantifying the amount of information that is actually transmitted, beyond the rote logical numerical answer suggested by Gleason in his textbook. In a response to the presidential question, if someone responds “Carter,” much much more information is transmitted to a listener than just the information that Carter is the best President. Any listener will assign a probability to that outcome, meaning reflexively that probabilities have been assigned to all other outcomes, but it will also say something about who the person is and their beliefs. But most of his other information could be called “peripheral information” as opposed to the “core information” transmitted by the response. Peripheral information is highly contextual.