The nail in the coffin of scholars attempting to deny the robustness of Berlin and Kay’s theory is contained within “Universal Foci and Varying Boundaries in Linguistic Color Categories” by Terry Regier (UChicago), Paul Kay (Berkeley), and Richard S. Cook (Berkeley). Looking at the World Color Survey of 110 unwritten languages, the authors found that foci (“best examples”) of six colors (white, black, red, green, yellow, blue) are virtually universal, although the borders of the category are somewhat more malleable and given to cross-cultural difference. Much more interesting, the authors attempt to predict these boundaries based on a computational model.

Why are these models important? Certainly, they represent a wide departure from much linguistics and anthropology, though graduate students in these fields do do learn basic, important statistics. Even in economics, there’s a vocal minority of professionals who scorn the utility of mathematics. Though it absolutely kills me to do so, I will (favorably) quote a Nobel Prize winning economist named Paul Krugman:

Math in economics can be extremely useful. I should know! Most of my own work over the years has relied on sometimes finicky math — I spent quite a few years of my life doing tricks with constant-elasticity-of-substitution utility functions. And the mathematical grinding served an essential function — that of clarifying thought. In the economic geography stuff, for example, I started with some vague ideas; it wasn’t until I’d managed to write down full models that the ideas came clear. After the math I was able to express most of those ideas in plain English, but it really took the math to get there, and you still can’t quite get it all without the equations.

What Krugman, who has long since stopped being an economist, is saying is that of course it all starts with ideas. But in order to develop these ideas into something scientific, you need to formalize the idea into equations. Why? Because equations eliminate ambiguity. It might take three pages to say what a solid equation says in a line. For someone trained in the practice, this is especially helpful because you can see which elements have been left out, where certain factors should be added, and it is vastly easier to challenge the assumptions of the equation or tweak them. If you don’t hold scientists to this standard, they can dance out of all kinds of things with ambiguity — indeed, this is just one of the problems with social sciences of the past. It’s what sets Gary Becker, for example, apart as a sociologist.

The idea I have that I would like to somehow formalize is how market demand works with biological properties of our eyes and chromatic qualities to provide a certain set of color words in a language. So step one would be to list out the factors that such a set of color words in a language is a function of. There are many factors that might be considered in trying to formalize an explanation for the consistent order of color words in languages. In one of Harold Conklin’s better known works, referred to me by a PhD student in Anthropology, the author shows that color words may not refer strictly to chromatic properties. Rather, they may be tied to other qualities, such as texture and/or ripeness. This implies that surface texture, in addition to brightness, hue, and focal points. Also tugging at any formalization would be the market demand for visual differentiation and the cost of spreading and maintaining a new word in the language. This latter cost certainly goes down as technology improves that aids in information recording (writing systems through printing presses and computers).

Let us assume that humans gain no utility from distinguishing between different visual frequencies of light (i.e. colors). If this is the case, then humans will use no words (or possible one?) for colors. Since every language has at least two color words, humans do gain utility from distinguishing between colors. Now let n be the number of words for colors that a language possesses. n only exists for languages above 1, because humans always derive enough utility to at least discriminate between lightness and darkness. Importantly, the contrast between these two “colors” is not limited to pure color values as they apply to all colors, suggesting that humans derive the most significant marginal utility from adding these words to a language compared to any other color words.

However, if these words are limited from their quality of brightness and instead are converted into a RGB scale, we might say that black is (0,0,0) while white is (255,255,255). The length of the line connecting these points in the cubic color space, 441.673, is the longest length contained within the space, though it is not unique. Other distances between points of 441.673 exist, as between lavender/purple (255,0,255) and green (0,255,0), red (255,0,0) and light blue (0,255,255), as well as blue (0,0,255) and yellow (255,255,0). These are not the next sets of colors to naturally occur in human languages.

As a universal matter, when n = 3, the third word is red, but when n = 4, the fourth word is not light blue. This means that human demand for a fourth color word is not exclusively a function of contrast in a cubic color space. Additionally, there must be a basis for choosing red as the third word. Why not green or blue? Any hypothesis must therefore take into account several factors for determining the universal order of color words in human languages. Now you’re beginning to see why an equation for the linguistic marketplace that explains how humans, across time and space, create a consistent and universal order for color words from n = 2 to 11 might be helpful. In any case, it could be modified and tweaked in an orderly manner which reduces the cost of discussion, modification, and adaptation to formalization.

Now, let us look at n = 3. What sets red apart from the remaining colors? It could be many things, but we need to establish a hypothesis. Possibilities include: perhaps an orthogonality or angle to the white / black axis in the cubic color space which might mean that, when the third word is added, the plane occupies the largest possible triangle area in the cubic color space given the black / white axis. Let us test this hypothesis. The area created by this triangle is 45,979.31 units squared. It is a simple matter to show that we would obtain the same area with green or blue. However, red does have the lowest frequency / highest wavelength among these candidates and, indeed, among all remaining colors. While we may believe that the latter point is conclusive, we do not yet know whether it is independent of the former method.

Now let us look at n = 4. Either green or yellow will always be in the fourth position, and when n = 5, its complement, that is, whichever was not selected between green or yellow for the fourth word, will be the fifth word. This implies that humans are indifferent between the two as a universal matter, but that culture-specific factors may significantly impact the decision. Additionally, both green and yellow represent vertices on the cubic color space, although they have relatively similar wavelength and frequency.

At this point, let us look at another instructive excerpt from Sampson’s The Language Instinct Debate:

…human perception of color is mediated by [a] sensory apparatus which is not equally sensitive to all areas of the colour space. Our eyes can detect great intensity of colour in the ‘focal red’ region, for instance; conversely, int he pale blue-green region we are much less sensitive, so that the most intense color we can experience in that area is not too different from a pale grey. One would naturally suppose that if a language has few words for colours, the words it does have will refer to the strongest sensations; and a comparison of Berlin and Kay’s focal points with the regions of greatest human colour sensitivity indeed shows a near-perfect match. In this respect, then, it is true that human biology does influence the conceptual structure of human language. (Incidentally, the influence is not totally consistent across the species: one reason why blue occupies a relatively late position on Berlin and Kay’s sequence is that dark-skinned people have pigment in their eyes making them less sensitive than Europeans to blue light, and their languages correspondingly often lack a word for ‘blue’.)

So now we see that there are biological factors that may see us choosing red before green and yellow and green and yellow before others. What happens when a language deviates from this model and puts orange in there? Or pink? Our equation would have variables and coefficients describing the influence of said variables that might help explain this “market demand” factor. Conceivably, we will be running different types of regressions, comparing with different kinds of baselines, to test our predictions.

I intend, in the coming weeks, to develop just such a formalization. There are other potential uses of this approach for linguistic analysis. Languages differ in the number of “number” words such as one, seven, thirteen, and so on. They also differ in the number of family member words that are available. Asking questions about similarly universal relationships in languages could yield answers about the role of language in culture, technological achievement, as well as evolution. Even though I know this is terribly boring. 🙂