Beam Search

Beam search decoding is an advanced technique used in text generation models, like transformers, to improve the quality of generated text. It addresses some of the limitations of simpler methods like greedy search by considering multiple potential sequences at each step in the generation process.

Concept

Beam search maintains a fixed number of candidate sequences (known as the “beam width” or “beam size”) at each step in the generation process. Here’s how it works:

Start with an Initial Token: The process begins with an initial token (like a start token) and expands from there.
Expand Each Sequence: At each step, the model considers extending each sequence in the beam by one additional word.
Calculate Probabilities: For each possible extension, the model calculates the probability of that word being the next word in the sequence.
Select Top Sequences: The beam search algorithm then selects the top ‘k’ sequences with the highest probabilities, where ‘k’ is the beam width. This step involves pruning less likely sequences and keeping only the most promising ones.
Repeat Until Completion: This process is repeated until each sequence in the beam reaches an end token or a maximum length.
Choose the Best Sequence: Finally, the algorithm selects the highest probability sequence from the beam as the output.

Example

Let’s say the model is generating text with a beam width of 2. Starting with the sentence “The cat”, it will:

Generate all possible next words (e.g., “sat”, “is”, “on”, etc.).
Calculate the probabilities for each new sequence (“The cat sat”, “The cat is”, etc.).
Keep the top 2 sequences with the highest probabilities.
Repeat this process for each subsequent word until the sequences are complete.

Beam Width

Narrow Beam (Small ‘k’): Faster and less memory-intensive but might miss high-quality sequences.
Wide Beam (Large ‘k’): More likely to find a high-quality sequence but slower and more memory-intensive.

Limitations

Search Space and Computational Cost: A larger beam width increases the search space, leading to higher computational costs.
No Guarantee of Optimality: While beam search increases the likelihood of finding a good sequence, it doesn’t always guarantee the best sequence.
Length Bias: Beam search can favor shorter sequences since probabilities tend to decrease with sequence length.

In summary, beam search provides a balance between greedy search (which considers only one sequence) and exhaustive search (which considers all possible sequences), aiming to efficiently find a high-quality sequence in a large search space.

No comments yet! You be the first to comment.

GET HELP

CONTACT US

Address : Sector 63A, Anishi's Utsav, Noida

Practical NLP With Transformers