The BertTokenizer.from_pretrained() method is a class method in the Hugging Face Transformers library that allows you to load a pre-trained tokenizer for the BERT model. This tokenizer converts text input into a format the BERT model can understand.
To use BertTokenizer.from_pretrained(), first make sure you have the transformers library installed:
pip install transformers
In the next step, you can load the tokenizer for a specific BERT model in your Python script.
from transformers import BertTokenizer
# Load the tokenizer for the 'bert-base-uncased' model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
There are several pre-trained BERT models available. Some common ones include:
- ‘bert-base-uncased‘: A smaller version of BERT that uses lowercase text.
- ‘bert-large-uncased’: A larger version of BERT that uses lowercase text, with more layers and parameters for higher accuracy but increased computational requirements.
- ‘bert-base-cased’: A smaller version of BERT that uses original cased text.
- ‘bert-large-cased’: A larger version of BERT that uses original cased text with more layers and parameters.
After loading the tokenizer, you can use it to tokenize text:
# Tokenize a single sentence
sentence = "This is John wick"
encoded_sentence = tokenizer.encode(sentence)
print("Tokenize a single sentence")
print(encoded_sentence)
See the below complete code.
from transformers import BertTokenizer
# Load the tokenizer for the 'bert-base-uncased' model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Tokenize a single sentence
sentence = "This is John wick"
encoded_sentence = tokenizer.encode(sentence)
print("Tokenize a single sentence")
print(encoded_sentence)
Output
Tokenize a single sentence
[101, 2023, 2003, 2198, 15536, 3600, 102]
You can see that it tokenized a single sentence.
That’s it.

Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Python language stands as a testament to his versatility and commitment to the craft.