Recurrent Neural Networks

What is a Recurrent Neural Network (RNN)?

An RNN is a specialized type of neural network designed to handle sequential data, where the order of information matters.

Unlike a traditional feedforward neural network that treats each input independently, an RNN has a “memory” that allows it to use information from previous steps in a sequence to influence the current output.

This makes them ideal for tasks like text generation, speech recognition, and time-series forecasting.

The Core Components (The Blocks)

The architecture of a basic RNN is centered around a single, repeating block or cell. This cell contains the key components that give the network its memory:

  • Input Layer (x_t): At each step in the sequence (denoted by the time step t), the network receives a new piece of data. For a text-based task, this could be a single word or character.
  • Hidden State (h_t): This is the “memory” of the network. It’s a vector that summarizes all the information from the past inputs up to the current time step. The hidden state from the previous time step (h_t−1) is fed back into the network along with the new input (x_t).
  • Output Layer (y_t): At each time step, the network can produce an output. For a language model, this would be a prediction for the next word.

When you see a diagram of an RNN, it’s often shown as a single cell with a loop, representing the flow of the hidden state back into itself. It can also be “unfolded” over time to show each step of the sequence as a distinct cell.


How It Works

The process of an RNN processing a sequence is a series of repeated steps:

  1. Receive Input: At time step t=1, the network receives the first input, x_1. The hidden state, h_0, is initialized (usually to all zeros).
  2. Update the Hidden State: The RNN’s cell takes both the new input, x_t, and the hidden state from the previous time step, h_t−1. It combines them using a mathematical function with a set of weights and then passes the result through an activation function (like tanh or ReLU) to create the new hidden state, h_t. This new hidden state now contains information from both the current input and the entire history of previous inputs.
  3. Generate Output: The new hidden state, h_t, is used to calculate the output, y_t.
  4. Repeat: The new hidden state, h_t, is then passed to the next time step, t+1, where it repeats the process with the next input in the sequence.

This loop allows the network to process an entire sequence of data, with each new step building on the “memory” of the hidden state.

Key Limitations

While RNNs were a major step forward, they had a critical flaw: the vanishing gradient problem. As the network processed a long sequence, the information from early time steps would gradually fade and become less important. This made it difficult for a vanilla RNN to learn long-term dependencies (e.g., a sentence where a word at the beginning influences a word at the end). This limitation led to the development of more advanced RNN architectures like Long Short-Term Memory (LSTMs) and Gated Recurrent Units (GRUs), which were explicitly designed with “gates” to better manage their memory and overcome this problem.