Transformers | Notion

1. Vectors in NLP

2. Self-Attention Mechanism

3. Multi-head Attention

4. Transformers

5. BERT

Conclusion

Vectors: Numerical representations of text for computational processing.
Self-Attention: Mechanism that allows models to focus on different parts of input sequences.
Transformers: Advanced model architecture leveraging self-attention to handle sequential data more effectively.
BERT: Transformer-based encoder-only architecture, which uses a bidirectional approach to capture the context of the words.

Understanding these concepts provides a solid foundation for working with state-of-the-art NLP models like BERT and GPT.

Jargons

Top K: Select an output from the top-k results after applying a random-weighted strategy using the probabilities.

Top-P: Selects and outputs using the random-weighted strategy with a cumulative probability <=p.

Temperature: The higher the temp, the higher the randomness and vice-versa. It alters the softmax functions and changes the probability values.