In the mesmerizing world of recurrent neural networks (RNNs), a revolutionary evolution took place in 2014 with the introduction of Gated Recurrent Units (GRUs). Positioned between the simplicity of vanilla RNNs and the sophistication of Long Short-Term Memory networks (LSTMs), GRUs offer a unique dance of efficiency and capability. Let’s unravel the intricacies of GRUs, understanding how they stand out and where they fit in the grand orchestra of neural networks.
A simplification of Long Short-Term Memory networks was proposed back in 2014. It is called a Gated Recurrent Unit and is similar to an LSTM, but it also has its fair share of differences. For example:
- It has an input gate and a forget gate.
- It lacks an output gate.
By consequence, it is faster to train as an LSTM, because it has fewer parameters.
However, this comes at a cost: GRUs have been shown to be incapable of performing some tasks that can be learned by LSTMs. For example, “(…) the GRU fails to learn simple languages that are learnable by the LSTM” (Wikipedia, 2016). This is why in practice, if you have to choose between LSTMs and GRUs, it’s always best to test both approaches.
Behind the scenes
GRUs are composed of a reset gate and an update gate. The goal of the reset gate is to forget whereas the goal of the update gate is to remind based on only the previous output and the current input. There are multiple variants (Wikipedia, 2016):
- A fully gated unit where there are three sub types, which compute outputs based on hidden state, hidden state + bias and bias only inputs.
- A minimally gated unit where the reset and update gates are merged into a forget gate, where the goal of the network is to forget what is not important, i.e., it should no longer try to remember the important things, but rather only forget the not-so-important things.
Visually, this is what a GRU looks like if we consider its fully gated version:
We can see that the output from the previous item [latex]h[t-1][/latex] and the input of…