Python Natural Language Toolkit (nltk) library provides tools to work with human language data (text).
The nltk library provides a wide range of functionality for tasks such as tokenization, stemming, and tagging parts of speech, as well as more advanced functions like parsing and semantic reasoning.
As a beginner, you may face some errors while working with the nltk package, and we will discuss NameError in this article.
NameError: name nltk is not defined
The NameError: name nltk is not defined error occurs in Python when we use the “nltk” module without importing it first, or it has not been installed in the environment.
Fix NameError: name nltk is not defined
To fix NameError: name nltk is not defined error, install the “nltk” module using this command: pip install nltk.
If you are using Python3, you can use this command: python3 -m pip install nltk.
After installing, you can import it into your Python script.
import nltk
Let’s write a simple program that tokenizes the simple text.
import nltk
nltk.download('punkt')
text = "14th Jan is Uttrayan!"
tokens = nltk.word_tokenize(text)
print(tokens)
Output
['14th', 'Jan', 'is', 'Uttrayan', '!']
In the above code, we will import the NLTK library and then download the ‘punkt’ package, a pre-trained tokenizer for NLTK.
The nltk package is needed to use the word_tokenize() function, which breaks a sentence into individual words or tokens.
The next step creates a variable text containing the sentence “14th Jan is Uttrayan!”
It then uses the word_tokenize() function to tokenize the sentence stored in the text variable and assigns the resulting list of tokens to the tokens variable.
Finally, it prints out the tokens variable using the print() function, which will output the following list: [’14th’, ‘Jan’, ‘is’, ‘Uttrayan’, ‘!’].
The tokenization process is language dependent. In this case, English is the language.
To tokenize sentences in other languages, you must download the relevant tokenizers.
Conclusion
Install the nltk module and import it without spelling errors; it will resolve most of the errors.
That’s it.