Or What Happens if Attention (and Transformer) Is All We Need?
Like your observation on the consolidation of architectures due to the transformer. It would be cool if we had a new wave of startups focussing on optimizations (software & hardware) to bring down the near-quadratic complexity of this architecture.
Like your observation on the consolidation of architectures due to the transformer. It would be cool if we had a new wave of startups focussing on optimizations (software & hardware) to bring down the near-quadratic complexity of this architecture.