Astle's stream

Ruminations on excerpts of research papers, blogs and books


True Crimes

Imagine you are a benign observer, and are able to observe every event happening in the world, at the same time (maybe through a crystal globe or something). You can not only objectively see what's happening, but can also accurately interpret the humans emotions that the people (basically everyone) are experiencing. Your observation is linear (A -> B -> C -> ...).

Now a crime, as defined by the people in that specific part of the world, has occurred, at say time t. Which would put it, in your timeline, between other events: ... -> Rt -> Crimet+1 -> St+2 ->... Was it a crime ? The people have judged it as such. Immoral, ruthless and objectively wrong. But you ? You know everything, events that those people would never know, events that people in general, can never know. What's your judgement ?

This little thought experiment would not stop you from calling out the crimes, but would really shine a new light on a majority of criminals/crimes that we've seen. Most of the times it's as if lighting a torch from your hand into a dark room, and claiming to know the entire room, without considering the highly restricted and narrow region your torch can show you of the room. Morality is not subjective, but an inter-subjective phenomenon, where humans together give birth to it, and only by relying on others to believe in it, does it survive.

What if a crime happened at A1 ? Again, that's the beginning of your perspective and narrative, wherein causes for the crimes have already occurred before you stepped up to be the Observer. To mitigate this, we have to go back to the beginning of time itself ...

Most people view the world as such: Kt -> Tt+2 -> Wt+n ... Their view is incomplete, inconsistent and full of misinformation. People aren't misinformed due to being exposed to objectively wrong information (in most cases), but are due to being exposed to a different perspective of the information (like a game of Chinese whisper, but with millions/billions of people over centuries and across the globe).

Despite all these points about perspectives, the nature of reality and human emotions and everything, there seem to exist pure crimes, or True Crimes. Such crimes are independent of any past events that have occurred before them, have zero connections to anything about dilemmas and humans emotions, and everything to do with a person being self-centred, self-absorbed and unempathetic, a true criminal. Mind you, I am NOT talking about psychopaths, since it's a biological condition. A common yet scary fact ? A True Criminal, capable of committing a true crime, is your average person.

Not all average people are true criminals, but all true criminals are average people. Who are they ? What crime am I even talking about ? I think we all know the answers to those questions. The thought experiments, though at first seem to provide some very good points about having a diplomatic view of the world, cannot shield us from True Crime. It happens regardless of what the situation was, what the person was going through and society at large.

The Computational Complexity Paradigm of Machine Learning

The bitter truth is by now well known amongst all those who wish to consume the knowledge behind anything Artificial Intelligence/Deep Learning. The lesson was simple: no matter how we structure our algorithms to mimic human capacity, it will always be outperformed by simple scaled architectures with massive amounts of data and parameters.

This lesson is quite reminiscent of the computational paradigm that Stephen Wolfram claimed to have started back in the 1980s, wherein the traditional mathematical thinking was left behind for the sake of exploration of a new way of thinking: from simplicity to complexity, through the power of computation. This would seem quite familiar to a form of Chaos theory, wherein simple initial conditions give rise to complex phenomena.

It was only then that I realised that the scaling law is but another beautiful aspect of computational paradigm, where in simple architectures (say embeddings + transformers) give rise to quite complex phenomena (whatever LLMs are capable of), and thus modern deep learning is a wonderful example for what we can elongate and call computational ML, or computational NLP.

This plays in tandem of the new era of computation that we can explore, where we need not abandon the older mathematical style of thinking, but rather complement it with the compute power it needs in order to show us rather astonishing breakthroughs. Simple linear algebra coupled with optimization algorithms turned out to mimic language quite well, and even learned to perform human-like reasoning. Such breakthroughs urge us to turn our attention to this new computational style of thinking, and explore further.

(In quite a fascinating way, mechanistic interpretability maybe one of this "holes" that Wolfram talks about, where the complex system that is the LLMs may not be computationally irreducible, but can indeed be reduced to it's simpler forms. This could ultimately not be true as well, and we may hit the truth that LLMs maybe computationally irreducible, but current research has been promising).

Finding useful connections from Wolfram's work into this field of ML was quite interesting, and indeed made sense. The mathematical foundations which were all discovered independent of the computational way of thinking turned out to pave the way to really great results when paired with computation and the complexity it brings with it. This also makes me wonder what would Ruliology + ML would look like, wherein we find simple "models" (not to be confused with ML models) which can lead to complex systems down the line (induction heads?).

While we explore more of this computational landscape, various new interesting modes would show up, and I would be just as excited to read (and hopefully contribute) to such paradigms.

Doctrines don't scale

The reason we have intra-species conflict is because doctrines do not scale effectively

Each individual person has, as is common knowledge, a mental model of the world. This mental model, as is given, is flawed. It's almost certain that any one person cannot truly grasp the reasons behind the what, why, and how of our world. These differences in flawed mental models are what primarily cause conflicts, and have been the primary reason since the dawn of Homo sapiens.

One could see this and come to a conclusion: since these mental models lose their malleability as we grow older, it'd be in vain to do anything about changing them, and thus more effort must be put into conflict resolution and cooperation.

Here we come upon another realization (at least I did) that we have been doing conflict resolution and cooperation on an unprecedented scale since the start of the agricultural revolution, in the form of cities, kingdoms, empires, and nations. It was finally religion that eventually captured the most number of people, and today is the main driver behind cooperation across thousands of kilometers.

But for human conflict to be truly gone, we need one single belief, shared by the entire human species. As of this moment, I cannot think of one. (There are people who do not even believe in the Earth being round, so...) Even the most widespread doctrines cannot capture the entirety of the human population, not to mention effectively capture their minds (how many Christians are actually devoted?)

This means that as of now, our primary tool of communication, language (and humans), has yet to come up with a doctrine that can scale to nearly eight billion minds. We do not yet have one nation, one religion, nor one language.

I think the reason is the difference in evolution of the hardware (the anatomy of our body/brain) and the software (our mental models). Our minds developed complicated theories much faster than our primate brains could evolve, hence we still live like the forager bands roaming in the savannah: just that our bands have gotten bigger and intermingled.

In the future, I am optimistic that at least one such belief would spread, leading to a true unification (one could argue that the fear of nuclear weapons is one such thing, but you never know) of the human species.

Unfeeling emotions

One of the things that I've heard most in my life is my lack of emotions, or rather my lack of empathy.

This results in most people viewing me as your average "logical", "rational" or "critical" minded person, wherein every conversation or debate seems like a formal proceeding in a court of law: no emotions, just facts.

Though in fact I had an almost opposite view of of my own self, shocking as that may seem. Overly emotional and unable to control them, going from short-bursts of anger to being over affectionate (puppies ofc) and sometimes, or I would say most of the times, overwhelming people around me.

This may seem contradicting, and any mention of this by me would lead people to think that either of those views of my self is wrong. (Either people are, or I am), and I would disagree

Both of those views are correct. While my internal self and external self may seem conflicting, I would actually put it under the umbrella of having intense emotions and living at the opposite ends of the spectrum, going from seeming overly emotional to seeming completely emotionless.

Is this an advantage ? The obvious answer is that it's impossible to decide which personality traits are advantageous in our modern society, as that would require us to define what having an advantage even looks like, but if it's genuine happiness, I guess I have it :)

Alternatives To Religious doctrine: why to live

  • Homo Mathematicus. Going personal with this one, but it could truly be the "language of the gods" or the code that underlies mother nature. From higher dimensional spaces that we reside in, to quantum consciousness explaining our deepest mysteries, we revel in the idea of formalism and creative thinking. The glory of studying math being only bestowed upon the chosen few, we must spend a lifetime trying to find the answers to the mystery of the universe.

  • Homo Economicus. The economic rational mind, always logical, always critical. Here we discuss the impact of marxism, socialism and Capitalism. The becoming of a society, the formation of culture. Thinkers and rationalists of this genre have put forth theories that continue to influence the thinking of large fraction of our populace. From financial institutions to the wealth of nations, this doctrine is as fascinating as it is vast.

  • Homo Philosophicus. From the western influence to the eastern, this doctrine spans thousands of years and is the mother of all disciplines. A form of thinking itself, nihilism, absurdism, modernism and post modernism are a few areas were countless intellectuals have tried to find answers to the question that is the Human. Modernism and Post-modern would try to amalgamate with modern epistemology to make behavioural models, while other forms reject everything. This doctrine is a rabbit hole of paradoxes and logic, history and well ... philosophy.


  1. Is God is a mathematician

  2. Economics Library

  3. PostModernism/Rationality

The Spiritual Mutiny of Intellectual Subsistence

History has been the best story-teller, teacher and guide that humans have encountered. Recording the thoughts, laws and events of the past has been one of the best decisions humans have ever taken.

This leads us down an adventurous path, where we follow the Human across time, finding various reasons to live, while being burdened with knowledge and an excellent prefrontal cortex . We stumble upon mythologies, religions and belief systems spread across lands, the cause of miracles and wars, life and death.

These belief systems are drivers of the human will, an invisible hand forcing the human brain to act a certain way, while directing entire societies, regimes and cultures, and have been doing so since the dawn of time.

Philosophy would be an introduction to the study of belief systems. Though I personally have not delved deep, my personal belief systems have evolved throughout my childhood, and I am currently exploring the vast forest that we call the Internet. Deep within the net, we find some interesting thoughts, while other places, such as youtube, offer some different ones.

My intellectual journey will continue till I die, but I hope to enjoy exploring the depths of thought, language and reality as I go on. That will be by mutiny to the intellectual subsistence of the modern times.

ML/DL/AI subfields: present and future

Long gone are the days were ML students used to code up RNNs and CNNs in order to utilise their very own models on tasks. Transformers put an end to that, why ? Scalability. Transformers only outperform the other architectures when scaled, and hence the average individual person could only stand in awe as millions of dollars, thousands of GPUs and >terabytes of storage was used in order to train foundation models. This paradigm shift is reminiscing of technology which is beyond the individual. However, as any field matures, one can find niches to lodge our efforts into, and hence I have gathered a few possible paths here. These maybe obsolete or solved in the upcoming years, but I shall not remove, but only extend this list, to track the growth of this field:

Language Entropy

While on my daily crusade of reading research papers, I found myself being fond of a very particular feature that they have: more information in less amount of words. This made them information dense. I begin to wonder on the complexity of concepts, their measurement and precisely their measurement through the tool we call language. Formalism somehow seems to be tied to these, so let me define a few interesting words, before we continue.

I will discuss some key intuitions below, which come from various concepts spread across computer science and statistics, though the required knowledge is just surface level.

Abstractness: The measure of how far the definition of a word is from a tangible object.

Abstractness of a word can be thought of as the depth at which it appears in a Tree with ∞-children, where each node is a word, and the root nodes are all tangible, real and physical objects (articles, names and other words), and their children are other words derived from them, but with more abstraction. As we climb down the tree, the words grow more abstract, as they are in-turn dependant on less abstract words, all the way to the root, the tangible words. Hence the depth at which these words occur is the "Abstractness" of the word.

Entropy: The measure of randomness, uncertainty and disorder.

Entropy = 1/Abstractness. More abstract words have lesser entropy, which means a sentence with more abstractness contains a lot of information in less amounts of words, and hence are more efficient, a form of compression where the knowledge is not provided by the writer, but is assumed to be known by the reader. Hence sentences, paragraphs or any other piece of text has a total entropy which is the product of all the entropies of each individual words (the reason for a multiplicative model over an additive one is to wipe out the effect of the root words, which have entropy = abstractness = 1).

Understanding: The measure of how much of a new piece of information is known prior to the revelation.

Understanding of a concept, word or any information can be interpreted as the amount of times we have encountered it before. Every time we are exposed to the same piece of information, we understand it a little better (deliberate or non-deliberately), and hence our understanding increases. More abstraction means more levels to climb before we reach the root node (which we have a perfect understanding of since we can directly observe it), and hence more complex the piece of text.

Complexity of any written text is dependant on it's total abstraction or it's entropy. A sentence or paragraph with more root words than abstract ones has more entropy, and so the information is "spread out" among many simpler words. As we compress the words into more abstract ones, the entropy decreases, while the complexity increases. The increase in complexity can be attributed to the fact that we need to go higher up the tree to reach a root node, while the connections between each parent and child node must also be strengthened in order to develop a strong intuition of the piece of text.

This can also be viewed as a simple function that maps a word to a scalar value.

f(word) -> R

R in this case can either be the abstractness or the entropy of that word. Which means the entropy of a sentence of a piece of text is:

Abstractness(Text) = Mult(Sum(f(words of Text))) Entropy = 1/Abstractness(Text)

Now with the advent of word embeddings, we can perform some more interesting operations. Let suppose a word is represented by an N-dimensional vector. Let the vector be called V. We can substitute the above given equation like so:

f(Rn) -> R

Abstractness(M) = Mult(Sum(f(V)))

Entropy = 1/Abstractness(M)

Where M is a bunch of such word vectors put together, hence a matrix. The function simply maps the matrix to a scalar value (entropy or abstractness), which is an indicator of complexity. Here, we cannot ignore the fact that complexity itself is relative, and must factor it in as well. The complexity of a piece of text highly depends on the knowledge base of the person reading the text (A simple sentence in Chinese is extremely difficult for me to understand, as I would have to construct a new language-tree from the root up to even begin understanding it).

Let suppose the knowledge base of a person is represented by the amount of words he/she is familiar with, including the nodes, their children and the weightage assigned to their connection, and call it K. This knowledge base, being made up of nodes as well, has it's own entropy, Entropy(K). This should, logically, be subtracted from our initial overall complexity (the product of all entropies of a piece of text) to get to the final "Complexity" of a sentence.

C = Mult(Sum(f(words of Text))) - Entropy(K)

C = 1/Abstractness(M) - Entropy(K)

This is a mere play of words, a mixture of thoughts and the written expression of the same. Formalism to express realism has always fascinated me, and hence I write this small piece.

Novel Model Architectures

Since I started learning more and more about Machine and by extension, Deep learning, it was all but clear that it was the practical implementation of formalising intelligence. The growth of neural networks due to scaling laws has only proved the consistency of the universal approximation theorem. Since the past few years, Transformers have reigned supreme has the state-of-the-art model architecture for almost all deep learning sub-fields. My attempt here is to gauge the vast literature that the field of formal intelligence offers, and look for any alternatives for the transformer architecture. It matters not that these novel methods may have failed or never adapted in real life, as they present an idea different than the mainstream ones and thus are fascinating to learn. Exploring the mathematical and engineering aspect of these methods would be of utmost interest to me.

High(er) Dimensions

Dimensionality is an important concept in essentially every STEM field, and much more. The concept of dimensions and what they are, where they are useful and ultimately what they represent was multi-faceted and thus I was intrigued enough to write a note/essay or this particular topic.

What are dimensions? In a word: features. A dimension is just a feature or an attribute of another object, be it an inanimate object or a living organism. The dimensions we are most familiar with are the three dimensions of space: length, breadth and height. But wait....aren't there more ? Fourth could be time, and as far as theoretical physicists are concerned, there are a lot more. How can scientists even claim that there are more dimensions when it's impossible for us to even imagine a fourth one ? It's hidden in representations.

We represent our reality through numbers. They are a crude, but sometimes fairly accurate representations of our reality. Equations that scientists have created in a closed laboratory or a classroom have come to predict the movement of stars and other celestial bodies, so yeah, we trust our numbers to model the universe around us. Knowing this, we represent our dimension with a list of numbers, say [1, 2]. But we have three dimensions, so we put three numbers: [1, 2, 3]. These three numbers are fairly good representations of space in various mathematical equations. That is, a certain feature of space is being represented by a vector.

But what's stopping us from putting in more numbers in our vector like so : [1, 2, 3, 4, 5, ... ] ? An obvious answer would be reality itself. There's no point, no physical counterpart to a vector of more than 3 numbers in it (just like the word unicorn has no physical representation in our real world). This was true, until during the pursuit of solving various equations, physicists were forced to expand the dimensions in order to solve (or formulate) the equations. Our theories forced us to go beyond our own senses and come up with more and more "dimensions" or features that represent space itself. (Whether it is true or not is out of my ability to grasp)

The language that we speak was modelled to a great extent my large language models (LLMs) in recent times. Their response not only makes syntactic sense, but also semantic. This worked because we were able to model our language, using a crude approximation, or in other words: vectors. Each word has N dimensions, or in other words, N features which give the LLMs power to use the word in different construction settings, or in more human words: they understand what the word means!

Understanding being analogous to "being able to see multiple attributes of an object" was something I had never thought before. It's only when our mental models construct multi-dimensional vectors of certain concepts or words do we truly understand the said concept or word.

Finding analogies between mathematical concepts and real life is fun and in a way enlightening. Modelling our reality with such approximations means whenever we are right, we are gifted with the greatest reward: understanding ourselves.

Judicial and Political Correctness

In a recent discussion with a friend of mine, I found myself explaining my lack of opinions on political matters and the lack of interest in judicial ones as well. The former has been (and probably will be) criticised as ignorant behaviour and irresponsible . With the general populace yearning to discuss political matters, my disinterest stems from a number of reasons, which I shall mention here.

Any opinion, be it political, personal or moral, is believed to be the absolute truth by the individual. You have opinions because it is your belief that they reflect the objective reality around you. That is the sole reason you even have them: having a mental map (however approximate) helps us navigate through the world and "make sense" of it . But it's almost always the case that our opinions do not reflect the objective reality, in some cases, not at all. Our opinions are the amalgamation of our cultural thinking , personal opinions of people we grew up with and our own personality traits . None of these factors force our opinions to reflect the objective reality, hell, none of these factors even force us to rationally analysis the facts and come to a logical conclusion.

A personality trait of mine is I like objectivity (you could guess where I am going with this). Opinions on any matter aren't mostly objective at all, hence I find no meaning in having them. Whenever we believe in something with all of our heart (and rational brains), we should also have the courage to call them facts. If you are hesitant in calling a certain thing as a fact and more comfortable with the term "opinion", you know somewhere you aren't exactly right . The problem this creates, is that our opinions drive reality: in Judicial matters. Judicial laws are largely made on opinions of the time it was written in, which makes them highly susceptible to change and ridicule by future generations. I would refrain from talking further as this could spiral into a long essay.

Social responsibility isn't having political opinions. It's not that I don't care what's happening in the world by not bothering to read on it, it's that no matter how much I read, I'll never have a grasp on the actual objective reality of the situation and would thus always carry a bias with me. The bias would depend on where I grew up, who I talk to and what my own personality is. And as long as the reality is unknown to me, my opinion will always be wrong (that's a personal belief).

So what should we do ? Not learn anything of the outside world ? Live in our own little bubble ? I think we should acknowledge the facts, agree that no one individual can grasp the entire situation and take action towards betterment of everyone around us.

AI and God-Man

AGI = Artificial General Intelligence
ASI = Artificial Super Intelligence

Learning is the slope of gathering information in a way that can be utilised later (let's call it L). With that being said, the rate of rate of learning is an interesting concept: it's the second derivative of gathering information or how fast can we learn to learn new things (let's call it R). The distribution of L, or what my rate of learning things is, follows a left skewed distribution where our L peaks at childhood/adolescence and starts to deteriorate as we get older. What about R ? I think that's completely upto the individual's effort and willingness to exponentiate their ability to learn things, but most people do not bother to climb down the next derivative .

What if something else did ? What if we build a system that focuses on learning to learn better and faster ? It'll result in exponential growth of everything we know. Knowledge and by extension technology growing at an exponential rate is, in our current state, unfathomable. We'd be left in the dust, scrambling to look ahead while the vehicle zooms past us. That's AGI, on it's way to be ASI. It's not a what if anymore: we are trying to build one, and maybe are getting closer.

A controversial theory for consciousness was written by Julian Jaynes in his Origin of Consciousness, where he suggests that we evolved consciousness only 3000 years ago, which means our ancestors where pretty much unconscious before that time. That claim has deeper implications, and the one I'm focusing on here is: it suggests we humans have evolved our brains, without changing its biological anatomy and it resulted in progress on such a scale. Consciousness was a necessary step in evolution. And of course the most probing question is : can we do it again ? If yes what'll it even look like ?

My initial thoughts were ASI outcompeting and destroying us if we get there, but if ASI was to provide humans with adversities that we've never before seen (for at least 3000 years (?)), is another human evolution possible ? Mark Hamilton argues in his book that such an evolution will happen, and it'll be our last. We will evolve ounce more, to become what he calls a God-Man . This sounds exactly how it is: we become literal gods. I do not know if this theory is even legit, but if I had to guess, our next evolution could be the ability to drastically improve R and to keep doing it throughout our lives (something we expect ASI to do easily) . A human who can do that would be to us what we are to chimpanzees. This same analogy is used to compare an ASI and us. We are the chimpanzees.

So what'll happen ? ASI vs Humans ? That's doomsday for us. ASI vs God Man ? That depends on whether Julian Jaynes theory is even legit, and even if it is, will Mark Hamilton's claim of it happening again is legit and under what conditions.

This may sound very highly speculative and based on unproven theories, but that's the fun part of not knowing the future: trying to imagine it.