Astle's stream

The Computational Complexity Paradigm of Machine Learning

The bitter truth is by now well known amongst all those who wish to consume the knowledge behind anything Artificial Intelligence/Deep Learning. The lesson was simple: no matter how we structure our algorithms to mimic human capacity, it will always be outperformed by simple scaled architectures with massive amounts of data and parameters.

This lesson is quite reminiscent of the computational paradigm that Stephen Wolfram claimed to have started back in the 1980s, wherein the traditional mathematical thinking was left behind for the sake of exploration of a new way of thinking: from simplicity to complexity, through the power of computation. This would seem quite familiar to a form of Chaos theory, wherein simple initial conditions give rise to complex phenomena.

It was only then that I realised that the scaling law is but another beautiful aspect of computational paradigm, where in simple architectures (say embeddings + transformers) give rise to quite complex phenomena (whatever LLMs are capable of), and thus modern deep learning is a wonderful example for what we can elongate and call computational ML, or computational NLP.

This plays in tandem of the new era of computation that we can explore, where we need not abandon the older mathematical style of thinking, but rather complement it with the compute power it needs in order to show us rather astonishing breakthroughs. Simple linear algebra coupled with optimization algorithms turned out to mimic language quite well, and even learned to perform human-like reasoning. Such breakthroughs urge us to turn our attention to this new computational style of thinking, and explore further.

(In quite a fascinating way, mechanistic interpretability maybe one of this "holes" that Wolfram talks about, where the complex system that is the LLMs may not be computationally irreducible, but can indeed be reduced to it's simpler forms. This could ultimately not be true as well, and we may hit the truth that LLMs maybe computationally irreducible, but current research has been promising).

Finding useful connections from Wolfram's work into this field of ML was quite interesting, and indeed made sense. The mathematical foundations which were all discovered independent of the computational way of thinking turned out to pave the way to really great results when paired with computation and the complexity it brings with it. This also makes me wonder what would Ruliology + ML would look like, wherein we find simple "models" (not to be confused with ML models) which can lead to complex systems down the line (induction heads?).

While we explore more of this computational landscape, various new interesting modes would show up, and I would be just as excited to read (and hopefully contribute) to such paradigms.