🔁 Were RNNs All We Needed?
The paper "Were RNNs All We Needed?" examines the efficiency of traditional recurrent neural networks (RNNs), specifically LSTMs and GRUs, for long sequences. The authors demonstrate that by removing hidden state dependencies from their input, forget, and update gates, LSTMs and GRUs can be trained efficiently using the parallel prefix scan algorithm, resulting in significantly faster training times. They introduce simplified versions of these RNNs, called minLSTMs and minGRUs, which use fewer parameters and achieve performance comparable to recent sequence models like Transformers and Mamba. The paper highlights the potential for RNNs to be competitive alternatives to Transformers, particularly for long sequences, and raises the question of whether RNNs were all that was needed for sequence modeling.
📎
Link to paper