AI Termcirca 1980· Added May 30, 2026
Tokens (NLP)
Tokens are the basic units of text that NLP models process.
In natural language processing (NLP), tokens are fundamental units of text that a model processes. They may be as small as a single character or as large as an entire word or phrase. Tokenization is the process by which text input is converted into tokens, allowing models to analyze and generate human-like text more efficiently. This conversion helps in understanding and representing complex language patterns, making it easier to perform tasks like translation, sentiment analysis, and chatbot interactions.
Examples
- GPT-3 tokenizes input text to calculate usage costs.
- Tokenization splits 'Hello World' into ['Hello', 'World'].
- BERT uses tokens to represent inputs for training models.
Common misconceptions
- Tokens are not always words; they can be subwords or characters.
- Token count does not equate to word count in a document.
Related terms
Want more like this?
Open the full library
Fresh AI mastery content every 2 hours.