Through the Lens of Now: How Presentism Shapes Language Models
This meme is a good reminisce of the concept of historical myopia/Presentism, where we humans have a skewed/biased view towards our current times. Once being reminded of this, it also becomes clear that our exploration of technology is also largely based on how we view the said technology, and how we draw analogies to our surroundings/nature.
I am sure there maybe many more examples, especially after the age of computers, but here I would like to trace out our latest innovation: large language models. Since this seems to be our latest belief, we can even see how each time we climb a level of abstraction to borrow ideas from. (We went from taking inspiration about how the brain functions, anatomically, to taking inspiration to how it thinks).
Initially the research seemed to be more "computational", in the sense that we had to move away from how our brain functions to take into account statistics and data science to dish out what empirically worked: model training at scale. An analogy here is we are simulating millions of years of human brain evolution, by training a massive model on most of today's human knowledge (called the Internet). The next paradigm, which is recent (post 2021), also called Supervised Fine-tuning, is about further training the model to behave in ways expected by users/consumers. The analogy is easy: after birth, we are trained till 18-21 years of age to behave, while societal and cultural values are indoctrinated in our young minds.
The analogies may seem a bit far-fetched, but once viewed this way, they do make sense (humans love analogies after all). The next paradigm is even more obviously drawing it's inspiration from how humans think: spend more compute power on harder problems, called Test-Time Scaling. It is as simple as that: keep the model "thinking" and "reasoning" so it can solve harder problems.
At this point, further research tries to draw on more inspiration to how humans think, perceive and reason. Another research tries to make the reasoning process implicit, similar to how humans sometimes perform well versed reasoning task without explicitly thinking about it, also called System 1 vs System 2 thinking.
These analogies help us see not only how our current technology closely resembles us in more and more abstract sense, but how we try to mimic our own thinking processes to nurture said technologies to their absolute potential.