ValueError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]] – Tokenizing BERT / Distilbert Error occurs when the input format provided to the tokenizer for the BERT or DistilBERT model is incorrect.
To fix the ValueError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]] – Tokenizing BERT / Distilbert Error, ensure that you are using the correct method for tokenization like the tokenizer.encode() or tokenizer.encode_plus() methods to tokenize your input if you are using the “transformers” library.
Example
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Tokenize a single string
input_text = "This is an example sentence."
encoded_input = tokenizer.encode_plus(input_text, return_tensors='pt')
# Tokenize a pair of strings
input_text1 = "This is the first sentence."
input_text2 = "This is the second sentence."
encoded_input = tokenizer.encode_plus(input_text1, input_text2, return_tensors='pt')
print(encoded_input)
Output
{'input_ids': tensor([[ 101, 2023, 2003, 1996, 2034, 6251, 1012, 102,
2023, 2003, 1996, 2117, 6251, 1012, 102]]), 'token_type_ids': tensor([[0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]]),
'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])
}
By ensuring that you are passing the correct input type and using the appropriate tokenization method, you should be able to fix the ValueError.
Ensure you pass a single string, a list of strings, or a tuple of two strings (or lists of strings) to the tokenizer. The sequence or batch of sequences to be encoded. Each sequence can be a string or a list of strings (pretokenized string). If the sequences are provided as a list of strings (pretokenized), you must set is_split_into_words=True (to lift the ambiguity with a batch of sequences).
The error suggests that the input might be of a different type or format. If your input is a single string, ensure it’s not mistakenly wrapped in a list or tuple.
That’s it.

Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Python language stands as a testament to his versatility and commitment to the craft.