Tokenization in the context of Natural Language Processing (NLP) is the process of dividing text into smaller parts called tokens.
Transformer models cannot receive raw strings as input instead, they assume the text has been tokenized and encoded as numerical vectors
There are different type of Tokenization
We will discuss each of them in detail