You are currently browsing the monthly archive for September 2011.
Ultimately, when we concern ourselves with language, we are always and without exception really talking about translation. That is, everything that we say about language is really a statement about a subfield of translation, which is the truer subject of study. Translation in the usual sense means converting the meaning from one arbitrary set of symbols to another. But the symbols portion of the definition is not a necessary condition, just a usual one. Rather, translation can be broadened to be seen as the art of focusing meaning from a more ambiguous source to a lesser one. But we will get to that.
Translation in the usual sense is something natural language processors are very concerned with. How, after all, do we get a computer program to recognize language? Developing software that breaks sound waves down and identifies the phonetics of the wave is the easy part. Encoding the complexities of a system with recursion and, worse, sophisticated senses of humor would stymie even the greatest of programmers– and indeed do on a regular basis.
Turing believed that we would know we had succeeded at the task when a machine is able to fool us into thinking it was a human through conversation. As a matter of odds, that means that when one speaks to a machine that can fool us, that we believe there is a roughly 50% chance it is a machine– even odds, in other words. A particularly entertaining RadioLab for me, Season 10 Episode 1 entitled “Talking to Machines” deals with different types of machines that seem like they’re communicating with us and the obvious question, “Are they aware?” This is a little bit of a different concern than Turing’s because Turing posited that a machine that could fool us would be aware of itself, but this is not necessary as a matter of logic.
In the episode, there are programmers who make profiles on sites like Match.com, and many many others, designed to try and fool regular humans. These bots respond to messages and keywords, often times fairly realistically. So much so, in fact, that many are regularly fooled. Now, if one expects this type of ruse, one might not so easily fall for it, especially when the tell-tale signs of the deception are revealed in the RadioLab episode. But for the trusting and/or unsuspecting, it is a different story. For the programmers, their experiment is easy to explain. What is more difficult is showing the technical methods they employ to achieve their results.
One method is to store words as matrices. Why matrices?
The Structure of Artificial Thought
Because matrices are very simple and flexible: they are easy to manipulate. This means that we cantranslate information into matrices and play with that information by performing operations on it, any kind of operation, in however many dimensions.
Let’s look at an example. Assume that I can create a database of all the current words in the English language, a snapshot. It would of course only reflect the language at a given time, seeing as how English changes so quickly. (We need not quibble with the different forms of English, sociolinguistics, and so on at this point.) I might accomplish this by storing each word as a matrix. For the sake of simplicity, let us say that verbs have a certainmxn matrix structure, and nouns and adjectives have different mxn structures.
The present tense form of the verb ‘to run’ is ‘run’ and that present tense verb can be stored with some arbitrary values in its matrix form:
[ 0 ]
[ 1 ]
[ 2 ]
There’re 3 rows, 1 columns in this matrix, a 3×1 matrix. Let us say that the matrix form for the past tense ‘ran’ is the following matrix:
[ 0 ]
[ 1 ]
[ 3 ]
In this example, the only difference is the last value. Assuming that the number of potential values which could slot in there is infinite, these words mean largely the same thing by arbitrary values stored in the matrix, the only difference is the last value which obviously determines tense in this type of structure. But this is just a very very brutally simple example of what the most sophisticated natural language processing models actually look like. The matrices usually are much larger and potentially infinite in size.
The Structure of Artificial Meaning
These matrices have many special properties. One of them is that we must be able to structure the matrices in a way so that we can perform regular types of operations, which would be analogs for syntactic interactions. Again, as a simple example, a noun might be, let us say a 3 x 3 matrix. The word ‘I’ could be:
[ 10 11 12 ]
[ 13 14 15 ]
[ 16 17 18 ]
We could generate in our matrix system / representation of language a system whereby we know a sentence is grammatical only if the matrix product of the noun phrase (NP) and verb phrase (VP) was a 3 x 1 matrix. The matrix product of ‘I’ and ‘ran’, that is, a NP and a VP, would form a 3 x 1 matrix:
We are not as concerned with the values of the product as of the form at this point. Since language is so complex, the form must obviously become more complex as well, without losing its flexibility. The reality is that while some core portions of the matrices by word types would have to have some kind of values for us to understand what they mean, have a frame of reference, and perform meaningful operations on them, many values may be variables– that is to say, they may be ambiguous.
The Structure of Ambiguous Meaning
While words like ‘love’ and ‘justice’ may be highly ambiguous and contextual in meaning, there are some words like ‘neutron’ or ‘hydrangea’ that are fairly specific. But even with these words, there is one way of changing their meaning. The meanings for their spoken and written forms is different. They necessarily must be– always.
Let us consider a 100 x 100 matrix that stores the meaning of ‘neutron.’ The core of the word might be stored in the 100×98 portion and then the 100×2 fragment at the end could be the contextual meaning that comes from the form the word is expressed in. For the spoken ‘neutron,’ it would be values that reflect the emotion of the voice, the tone, the pacing, the accent, the education, all kinds of things that might come out through pronunciation of a word. For the written ‘neutron,’ the 100×2 fragment means the most at time zero, when it is initially written. If the word is written at that time, a reader still has a very good proxy for what an author intended, but still is not privy to as much information as what the listener of the spoken ‘neutron’ is. This means two things. (1) The values in the last 100×2 fragment will be different, not necessarily entirely or even mostly so, but necessarily so in part; (2) The meaning of the written is more ambiguous due to the uncertainty of what an author meant to communicate. There is always a tone, even for a written word, but it is far more subject to fancy and therefore obviously more ambiguous. Variables of a sort will be needed in the written 100×2 fragment.
In my book Cultural Entropy, I devote some time to information theory, for the concept of entropy is impossible to explain without it. Likewise, attempting an explanation of cultural information, particularly the language subset of it, without entropy, is impossible. In reading various sources about information and language, I am struck by how excellent and simple the older texts are and how confusing or negligent are the newer texts. Language Files, which is a standard text for introductory linguistics courses, shows nothing, though it does discuss pragmatics.
But before the field was called pragmatics, and when linguistics had a little more perspective, the most common linguistics textbook was An Introduction to Descriptive Linguistics by H.A. Gleason (1955, 1961). This latter book, in particular, also forms an excellent foundation for a linguistics novice introduced in Field Linguistics, which I often analogize to amphibious warfare: the process of starting with zero firepower ashore and proceeding to dominance of the field. Field Linguistics as a practice is quite similar. A linguist arrives to a place s/he has never been, perhaps a village in remote Papua New Guinea, beginning with close to zero knowledge of the language and necessarily proceeding to learn everything, discerning a grammar, phonetic inventory, and all manner of other information. It is, in other words, a supremely practical art. Just so, Gleason’s textbook.
For the purposes of my discussion here, Descriptive Linguistics rises to the occasion as well. We begin with definitions:
The amount of information increases as the number of alternatives increases. […] Information is measured in units called… bits. By definition, a code with two alternative signals, both equally likely, has a capacity of one bit per use. A code with four alternatives is defined as having a capacity of two bits per use…. […] The amount of information in any signal is the logarithm to the base two of the reciprocal of the probability of that signal.
This about sums up the useful parts for any schema of quantifying meaning that we might wish to undertake 50 years after the text was written. Focus on the point about alternatives. In a world with two machines communicating to each other, but only ever saying 1 or 0 back to each other and only once before responding, then the machines have only has two choices and they are both equally likely. The capacity is one bit. The machine might send its transmission in the following form:  or . A code with four alternatives between the machines might look something like this: [0 0], [0 1], [1 0], or [1 1]. In fact, these would be all four of the alternatives and it’s a capacity of two bits being used.
Most human communication doesn’t look like this at all. True, we do often communicate in ways that necessitate or at least allow for either/or answers that might look like  or . But most human utterances and writing look more like what you’re reading in terms of expressing ideas, narratives, and concepts, not just yes/no or either/or responses. An example of something slightly more complicated would be the set of alternatives to the question: which U.S. President from 1980 – 2011 has been the best? You have six choices: Carter, Reagan, Bush 41, Clinton, Bush 43, and Obama. The response, therefore, could be encoded as simply as , , , , , or  depending only on which number referred to which President. Another step up in complexity would be the set of alternatives to the question: which color is the best? As a technical matter, given the number of frequencies visible to the human eye, the answer is theoretically unlimited. There is, however, a practical limit: language. Every language only has so many recognized color words at any given moment. Some have as few as two, it is believed, while others have somewhere between 3 and 11, and a good many others have considerably more. English certainly falls into the last category and every 64 or 128 pack of crayons you see in the store proves it. There are many alternatives to choose from here.
Something that has been avoided by many linguists and information theorists until recently has been quantifying the amount of information that is actually transmitted, beyond the rote logical numerical answer suggested by Gleason in his textbook. In a response to the presidential question, if someone responds “Carter,” much much more information is transmitted to a listener than just the information that Carter is the best President. Any listener will assign a probability to that outcome, meaning reflexively that probabilities have been assigned to all other outcomes, but it will also say something about who the person is and their beliefs. But most of his other information could be called “peripheral information” as opposed to the “core information” transmitted by the response. Peripheral information is highly contextual.
Obtaining a kind of precision in expression has previously been the purview of mathematicians, logicians, statisticians, and those who use symbols to express the barest minimum of relationships amongst the most pure of concepts. Ambiguous words are (or should be, as a professional matter) as foreign, and as luxurious as the sweetest Dulce de Leche ice cream served at a zero depth pool in a hidden Bali mountain is for us.
But precision of expression is more important than most people think and precision varies enormously by form. When someone says, “The Gators won the football game,” the meaning is different than when someone writes it on a piece of paper. These two prior forms are still different from when it is typed and sent over email and these three prior forms are yet different from those words when painted on a canvas, or spray painted on a wall. The meanings are not so different that we cannot fathom the gaps, so I don’t mean to belabor the point. Rather, I merely want to point out that form matters. I’ll say more about this later.
My goal in this short series of posts will be to lay out a method for articulating differences in meaning, as well as comparing meanings, distinguishing levels of ambiguity in meaning, and why all these things are important. Finally, I will summarize what all of this means for the always growing, always diminishing set of cultural information we use as humans.