Astle's stream

Novel Model Architectures

Since I started learning more and more about Machine and by extension, Deep learning, it was all but clear that it was the practical implementation of formalising intelligence. The growth of neural networks due to scaling laws has only proved the consistency of the universal approximation theorem. Since the past few years, Transformers have reigned supreme has the state-of-the-art model architecture for almost all deep learning sub-fields. My attempt here is to gauge the vast literature that the field of formal intelligence offers, and look for any alternatives for the transformer architecture. It matters not that these novel methods may have failed or never adapted in real life, as they present an idea different than the mainstream ones and thus are fascinating to learn. Exploring the mathematical and engineering aspect of these methods would be of utmost interest to me.

HyperNetworks
Neural Ordinary Differential Equations
Free Equivariance Neural Networks
Kolmogorov-Arnold Networks
State Space Models (S4, S5, S6)
Neural Casual Models
MEGA family models (MEGALODON )
Spiking Neural Networks
AlphaFold
Neural Jacobian fields
Hyperbolic Neural Networks