1. Vectors in NLP
2. Self-Attention Mechanism
3. Multi-head Attention
4. Transformers
5. BERT
Conclusion
- Vectors: Numerical representations of text for computational processing.
- Self-Attention: Mechanism that allows models to focus on different parts of input sequences.
- Transformers: Advanced model architecture leveraging self-attention to handle sequential data more effectively.
- BERT: Transformer-based encoder-only architecture, which uses a bidirectional approach to capture the context of the words.
Understanding these concepts provides a solid foundation for working with state-of-the-art NLP models like BERT and GPT.
Jargons
Top K: Select an output from the top-k results after applying a random-weighted strategy using the probabilities.
Top-P: Selects and outputs using the random-weighted strategy with a cumulative probability <=p.
Temperature: The higher the temp, the higher the randomness and vice-versa. It alters the softmax functions and changes the probability values.