Ruminations on excerpts of research papers, blogs and books

Language Entropy

While on my daily crusade of reading research papers, I found myself being fond of a very particular feature that they have: more information in less amount of words. This made them information dense. I begin to wonder on the complexity of concepts, their measurement and precisely their measurement through the tool we call language. Formalism somehow seems to be tied to these, so let me define a few interesting words, before we continue.

I will discuss some key intuitions below, which come from various concepts spread across computer science and statistics, though the required knowledge is just surface level.

Abstractness: The measure of how far the definition of a word is from a tangible object.

Abstractness of a word can be thought of as the depth at which it appears in a Tree with ∞-children, where each node is a word, and the root nodes are all tangible, real and physical objects (articles, names and other words), and their children are other words derived from them, but with more abstraction. As we climb down the tree, the words grow more abstract, as they are in-turn dependant on less abstract words, all the way to the root, the tangible words. Hence the depth at which these words occur is the "Abstractness" of the word.

Entropy: The measure of randomness, uncertainty and disorder.

Entropy = 1/Abstractness. More abstract words have lesser entropy, which means a sentence with more abstractness contains a lot of information in less amounts of words, and hence are more efficient, a form of compression where the knowledge is not provided by the writer, but is assumed to be known by the reader. Hence sentences, paragraphs or any other piece of text has a total entropy which is the product of all the entropies of each individual words (the reason for a multiplicative model over an additive one is to wipe out the effect of the root words, which have entropy = abstractness = 1).

Understanding: The measure of how much of a new piece of information is known prior to the revelation.

Understanding of a concept, word or any information can be interpreted as the amount of times we have encountered it before. Every time we are exposed to the same piece of information, we understand it a little better (deliberate or non-deliberately), and hence our understanding increases. More abstraction means more levels to climb before we reach the root node (which we have a perfect understanding of since we can directly observe it), and hence more complex the piece of text.

Complexity of any written text is dependant on it's total abstraction or it's entropy. A sentence or paragraph with more root words than abstract ones has more entropy, and so the information is "spread out" among many simpler words. As we compress the words into more abstract ones, the entropy decreases, while the complexity increases. The increase in complexity can be attributed to the fact that we need to go higher up the tree to reach a root node, while the connections between each parent and child node must also be strengthened in order to develop a strong intuition of the piece of text.

This can also be viewed as a simple function that maps a word to a scalar value.

f(word) -> R

R in this case can either be the abstractness or the entropy of that word. Which means the entropy of a sentence of a piece of text is:

Abstractness(Text) = Mult(Sum(f(words of Text))) Entropy = 1/Abstractness(Text)

Now with the advent of word embeddings, we can perform some more interesting operations. Let suppose a word is represented by an N-dimensional vector. Let the vector be called V. We can substitute the above given equation like so:

f(Rn) -> R

Abstractness(M) = Mult(Sum(f(V)))

Entropy = 1/Abstractness(M)

Where M is a bunch of such word vectors put together, hence a matrix. The function simply maps the matrix to a scalar value (entropy or abstractness), which is an indicator of complexity. Here, we cannot ignore the fact that complexity itself is relative, and must factor it in as well. The complexity of a piece of text highly depends on the knowledge base of the person reading the text (A simple sentence in Chinese is extremely difficult for me to understand, as I would have to construct a new language-tree from the root up to even begin understanding it).

Let suppose the knowledge base of a person is represented by the amount of words he/she is familiar with, including the nodes, their children and the weightage assigned to their connection, and call it K. This knowledge base, being made up of nodes as well, has it's own entropy, Entropy(K). This should, logically, be subtracted from our initial overall complexity (the product of all entropies of a piece of text) to get to the final "Complexity" of a sentence.

C = Mult(Sum(f(words of Text))) - Entropy(K)

C = 1/Abstractness(M) - Entropy(K)

This is a mere play of words, a mixture of thoughts and the written expression of the same. Formalism to express realism has always fascinated me, and hence I write this small piece.

Hosted on streams.place.