There's likely not enough conversation about the information theoretic mechanisms that are occurring at the level of unsupervised learning via transformers and GPU scaling.
Backpropagation, reinforcement learning, etc. doesn't quite feel nuanced/granular/specific enough.
It's clear that humans are abstracted away from the "learning evolution" that is occuring as scale increases, even though it is clear that more data and compute scale are leading to new performance outcomes.
This conversation currently is occurring at the level of discussing "can AIs reason?" or interpretability of neural nets/neurons.
The information theoretic cutting, prioritizing, triaging, trading off of different data is perhaps one of the most interesting and least talked about things at the moment.
For example, if there are two conflicting viewpoints in a corpus of text, then how do thes model assess what is "true?"
Inherently there is an assessment of what to output for a prompt for a specific subjective topic with diverse viewpoints.
Likely at some level, the models are assessing truthiness weighted by something (e.g. like Dalio describing believeability weighting).
With post training, it's possible to RLHF something to gradient toward specific beliefs/viewpoints (e.g. "woke or based model").