Ultimately, when we concern ourselves with language, we are always and without exception really talking about translation. That is, everything that we say about language is really a statement about a subfield of translation, which is the truer subject of study. Translation in the usual sense means converting the meaning from one arbitrary set of symbols to another. But the symbols portion of the definition is not a necessary condition, just a usual one. Rather, translation can be broadened to be seen as the art of focusing meaning from a more ambiguous source to a lesser one. But we will get to that.

Translation in the usual sense is something natural language processors are very concerned with. How, after all, do we get a computer program to recognize language? Developing software that breaks sound waves down and identifies the phonetics of the wave is the easy part. Encoding the complexities of a system with recursion and, worse, sophisticated senses of humor would stymie even the greatest of programmers– and indeed do on a regular basis.

Thinking Machines
Turing believed that we would know we had succeeded at the task when a machine is able to fool us into thinking it was a human through conversation. As a matter of odds, that means that when one speaks to a machine that can fool us, that we believe there is a roughly 50% chance it is a machine– even odds, in other words. A particularly entertaining RadioLab for me, Season 10 Episode 1 entitled “Talking to Machines” deals with different types of machines that seem like they’re communicating with us and the obvious question, “Are they aware?” This is a little bit of a different concern than Turing’s because Turing posited that a machine that could fool us would be aware of itself, but this is not necessary as a matter of logic.

In the episode, there are programmers who make profiles on sites like, and many many others, designed to try and fool regular humans. These bots respond to messages and keywords, often times fairly realistically. So much so, in fact, that many are regularly fooled. Now, if one expects this type of ruse, one might not so easily fall for it, especially when the tell-tale signs of the deception are revealed in the RadioLab episode. But for the trusting and/or unsuspecting, it is a different story. For the programmers, their experiment is easy to explain. What is more difficult is showing the technical methods they employ to achieve their results.

One method is to store words as matrices. Why matrices?

The Structure of Artificial Thought
Because matrices are very simple and flexible: they are easy to manipulate. This means that we cantranslate information into matrices and play with that information by performing operations on it, any kind of operation, in however many dimensions.

Let’s look at an example. Assume that I can create a database of all the current words in the English language, a snapshot. It would of course only reflect the language at a given time, seeing as how English changes so quickly. (We need not quibble with the different forms of English, sociolinguistics, and so on at this point.)  I might accomplish this by storing each word as a matrix. For the sake of simplicity, let us say that verbs have a certainmxn matrix structure, and nouns and adjectives have different mxn structures.

The present tense form of the verb ‘to run’ is ‘run’ and that present tense verb can be stored with some arbitrary values in its matrix form:

[  0  ]
[ 1 ]
[ 2 ]

There’re 3 rows, 1 columns in this matrix, a 3×1 matrix. Let us say that the matrix form for the past tense ‘ran’ is the following matrix:

[ 0 ]
[ 1 ]
[ 3 ]

In this example, the only difference is the last value. Assuming that the number of potential values which could slot in there is infinite, these words mean largely the same thing by arbitrary values stored in the matrix, the only difference is the last value which obviously determines tense in this type of structure. But this is just a very very brutally simple example of what the most sophisticated natural language processing models actually look like. The matrices usually are much larger and potentially infinite in size.

The Structure of Artificial Meaning
These matrices have many special properties. One of them is that we must be able to structure the matrices in a way so that we can perform regular types of operations, which would be analogs for syntactic interactions.  Again, as a simple example, a noun might be, let us say a 3 x 3 matrix. The word ‘I’ could be:

[ 10 11 12 ]
[ 13 14 15 ]
[ 16 17 18 ]

We could generate in our matrix system / representation of language a system whereby we know a sentence is grammatical only if the matrix product of the noun phrase (NP) and verb phrase (VP) was a 3 x 1 matrix. The matrix product of ‘I’ and ‘ran’, that is, a NP and a VP, would form a 3 x 1 matrix:


We are not as concerned with the values of the product as of the form at this point.  Since language is so complex, the form must obviously become more complex as well, without losing its flexibility. The reality is that while some core portions of the matrices by word types would have to have some kind of values for us to understand what they mean, have a frame of reference, and perform meaningful operations on them, many values may be variables– that is to say, they may be ambiguous.

The Structure of Ambiguous Meaning
While words like ‘love’ and ‘justice’ may be highly ambiguous and contextual in meaning, there are some words like ‘neutron’ or ‘hydrangea’ that are fairly specific. But even with these words, there is one way of changing their meaning. The meanings for their spoken and written forms is different. They necessarily must be– always.

Let us consider a 100 x 100 matrix that stores the meaning of ‘neutron.’  The core of the word might be stored in the 100×98 portion and then the 100×2 fragment at the end could be the contextual meaning that comes from the form the word is expressed in. For the spoken ‘neutron,’ it would be values that reflect the emotion of the voice, the tone, the pacing, the accent, the education, all kinds of things that might come out through pronunciation of a word. For the written ‘neutron,’ the 100×2 fragment means the most at time zero, when it is initially written. If the word is written at that time, a reader still has a very good proxy for what an author intended, but still is not privy to as much information as what the listener of the spoken ‘neutron’ is. This means two things. (1) The values in the last 100×2 fragment will be different, not necessarily entirely or even mostly so, but necessarily so in part; (2) The meaning of the written is more ambiguous due to the uncertainty of what an author meant to communicate. There is always a tone, even for a written word, but it is far more subject to fancy and therefore obviously more ambiguous. Variables of a sort will be needed in the written 100×2 fragment.