If you are collecting users’ email addresses for marketing purpose, then it becomes essential that the email addresses are in the correct format and potentially usable.
Here are some basic email rules that must be followed:
- “Email” should not have a leading dot in the local part.
- “Email” should not have consecutive dots.
- It should contain valid characters in local and domain parts.
- Length restrictions (total email length, domain length, local part length).
- Internationalized domain names (IDNs).
Here are three ways to check if a string is a valid email address in Python:
- Using “re” module
- Using “email_validator” module
- Using “email.utils.parseaddr()” module
Method 1: Using “re” module
You can use the “re.match()” method to search the regular expression pattern and try to match a defined pattern. If you combine this with the “is not” operator, you will get True for a valid email address and false otherwise.
import re def is_valid_email(email): pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' return re.match(pattern, email) is not None # Usage print(is_valid_email("user@example.com")) print(is_valid_email("invalid-email"))
Output
True False
This approach is efficient for basic email validation and quick to implement. However, it may not catch all edge cases or comply with RFC 5322 standards.
To cover all the use cases and check strictly, you need to expand your regular expression pattern and add various rules like this:
import re email_regex = re.compile(r""" ^(?!\.)((?!.*\.{2})[a-zA-Z0-9\u00C0-\u02FF\u0370-\u1EFF] [\u00C0-\u02FF\u0370-\u1EFF\w\-\.!#$%&'*+\/=?^`{|}~\x27]{0,63}) @(?=.{1,255}$)(?!-)([a-zA-Z0-9][a-zA-Z0-9\-]{0,62}[a-zA-Z0-9]) \.(?:[a-zA-Z]{2,}|xn--[a-zA-Z0-9]{2,})$ """, re.VERBOSE | re.UNICODE) def is_valid_email(email): return bool(email_regex.match(email)) # Usage print(is_valid_email("user@example.com")) print(is_valid_email("invalidemail@invalid.co.org")) print(is_valid_email("krunal@appdividend.com")) print(is_valid_email("krunal@appdividend"))
Output
True False True False
In this code, we wrote a comprehensive regex pattern that includes all the rules for email addresses. If the input string doesn’t satisfy any of the rules, it will mark that email address as invalid and return false.
Validate emails from a text file
If you want to create a file containing valid and invalid email addresses from another text file, you need to perform the file operations in combination with the re.match() function.
Here is our “emails.txt” file, which we will read in another file to find the valid and invalid email addresses:
john.english@example.com invalid.email@ kb_smith@company.co.uk not_an_email user456@subdomain.example.org missingatsymbol.com test+alias@gmail.com spaces are @not.allowed lastexample@valid.io
Here is our main Python code:
import re from pathlib import Path def is_valid_email(email): # This is a simple regex pattern. # You can replace it with more complex validation if needed. pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' return re.match(pattern, email) is not None def validate_emails_from_file(input_file, output_file): input_path = Path(input_file) output_path = Path(output_file) if not input_path.exists(): print(f"Error: Input file '{input_file}' does not exist.") return valid_emails = [] invalid_emails = [] with input_path.open('r') as file: for line in file: email = line.strip() if is_valid_email(email): valid_emails.append(email) else: invalid_emails.append(email) with output_path.open('w') as file: file.write("Valid emails:\n") for email in valid_emails: file.write(f"{email}\n") file.write("\nInvalid emails:\n") for email in invalid_emails: file.write(f"{email}\n") print(f"Validation complete. Results written to '{output_file}'.") print(f"Valid emails: {len(valid_emails)}") print(f"Invalid emails: {len(invalid_emails)}") # Usage input_file = "emails.txt" output_file = "validation_results.txt" validate_emails_from_file(input_file, output_file)
Output
Validation complete. Results written to 'validation_results.txt'. Valid emails: 5 Invalid emails: 4
Here is the output “validation_results.txt” file:
Valid emails: john.english@example.com kb_smith@company.co.uk user456@subdomain.example.org test+alias@gmail.com lastexample@valid.io Invalid emails: invalid.email@ not_an_email missingatsymbol.com spaces are @not.allowed
At first, we read a text file line by line, checked the email address against the regex pattern, and created two lists: one for valid emails and the other for invalid emails. Furthermore, we wrote both lists to a new file, and if you check out that file, it looks like the above output.
Method 2: Using “email_validator” module
If you don’t want to use regular expression, you can use the third-party “email-validator” library. It provides validate_email() function that checks if the email is valid or not.
You can install the “email_validator” library using the command below:
pip install email_validator
Here is the code for the email_validator module:
from email_validator import validate_email, EmailNotValidError def valid_email_using_email_validator(email): try: validate_email(email) return True except EmailNotValidError: return False print(valid_email_using_email_validator("john.english@example.com")) print(valid_email_using_email_validator("invalid.email@")) print(valid_email_using_email_validator("kb_smith@company.co.uk")) print(valid_email_using_email_validator("not_an_email")) print(valid_email_using_email_validator("user456@subdomain.example.org")) print(valid_email_using_email_validator("user@example.com")) print(valid_email_using_email_validator("invalid-email"))
Output
True False True False True True False
This approach is robust, requires less coding, and complies with RFC 5322 standards. It handles internationalized email addresses and provides detailed error messages if anything goes sideways. However, this method requires installing third-party packages and may be overkill for simple use cases.
Method 3: Using built-in “email.utils” module
Python provides a built-in “email.utils” module that has a “parseaddr()” function.
from email.utils import parseaddr def is_valid_email_using_parseaddr(email): return '@' in parseaddr(email)[1] print(is_valid_email_using_parseaddr("john.english@example.com")) print(is_valid_email_using_parseaddr("invalid.email@")) print(is_valid_email_using_parseaddr("kb_smith@company.co.uk")) print(is_valid_email_using_parseaddr("not_an_email")) print(is_valid_email_using_parseaddr("user456@subdomain.example.org")) print(is_valid_email_using_parseaddr("user@example.com")) print(is_valid_email_using_parseaddr("invalid-email"))
Output
True False True False True True False
If you want to use any library, then I highly recommend you use the “email_validator” library instead of the “email.utils” library because “email.utils” doesn’t validate the structure of the email address.
Conclusion
If you are looking for basic email validation, use the “regular expression”. If you are looking for advanced validation, then you have to either write a “comprehensive regex pattern” to use the “re” module or use the “email_validator” library for RFC compliance.