Skip to content
  • (+91) 9409548155
  • support@appdividend.com
  • Home
  • Pricing
  • Instructor
  • Tutorials
    • Laravel
    • Python
    • React
    • Javascript
    • Angular
  • Become A Tutor
  • About Us
  • Contact Us
Menu
  • Home
  • Pricing
  • Instructor
  • Tutorials
    • Laravel
    • Python
    • React
    • Javascript
    • Angular
  • Become A Tutor
  • About Us
  • Contact Us
  • Home
  • Pricing
  • Instructor
  • Tutorials
    • Laravel
    • Python
    • React
    • Javascript
    • Angular
  • Become A Tutor
  • About Us
  • Contact Us
Python

How to Remove Emoji from the Text in Python

  • 07 Sep, 2024
  • Com 0
How to Remove Emoji from the Text in Python

In the modern world, if you do not use emojis in your text, you will be considered “grandpa!” But this can be annoying when it comes to analyzing the data.

If you are working with traditional NLP models, they are not trained on or optimized for emoji usage. To train them and feed them data accurately, we need to remove emojis from the text or string.

Text with emojis and without emojis

The above image representation has a text that contains two types of unicode characters:

  1. Normal string (Data)
  2. Emojis (✍ 🌷 πŸ“Œ πŸ‘ˆπŸ» πŸ–₯ πŸ˜€ πŸ˜ƒ πŸ˜„ 😁 πŸ˜† πŸ˜… πŸ˜‚)

After removing the emojis from the text, it looks like it only has a regular string.

The main goal of this tutorial is to clean the text that contains emojis programmatically.

Here are three ways to remove emojis from the text in Python:

  1. Using the “re” module
  2. Using an “emoji” package
  3. Using “cleantext” module

Method 1: Using “re” module

When it comes to finding and replacing specific patterns, there is no better module than “re”.Β Our first step would be to create emoji_pattern object using “re.compile()” method and use the emoji_pattern.sub(r”, text) method to replace any matches of the emoji pattern within the string with an empty string (”), effectively removing them.

If you are looking for customization and performance, go for this approach because it covers a wide range of emoji unicodes and is relatively fast for most use cases.

import re

# Custom function that uses .compile()  and .sub() to replace
# emoji type patterns with empty string


def remove_emoji(txt):
    emoji_pattern = re.compile("["
                               u"\U0001F600-\U0001F64F"  # emoticons
                               u"\U0001F300-\U0001F5FF"  # symbols & pictographs
                               u"\U0001F680-\U0001F6FF"  # transport & map symbols
                               u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                               u"\U00002702-\U000027B0"
                               u"\U000024C2-\U0001F251"
                               "]+", flags=re.UNICODE)
    return emoji_pattern.sub(r'', txt)


txt = "Data ✍ 🌷 πŸ“Œ πŸ‘ˆπŸ» πŸ–₯ πŸ˜€ πŸ˜ƒ πŸ˜„ 😁 πŸ˜† πŸ˜… πŸ˜‚"
print("Before removing emojis")
print(txt)
print("After removing emojis")
print(remove_emoji(txt))

Output

Before removing emojis
Data ✍ 🌷 πŸ“Œ πŸ‘ˆπŸ» πŸ–₯ πŸ˜€ πŸ˜ƒ πŸ˜„ 😁 πŸ˜† πŸ˜… πŸ˜‚

After removing emojis
Data

Method 2: Using “emoji” module

The “emoji” library provides emoji.replace_emoji() function that finds and replaces “emojis” with an empty string.

Using emoji module

First, you need to install the “emoji” library using the command below:

pip install emoji

Now, you can import it and use its replace_emoji() function.

import emoji

# Creating a custom function that accepts "txt"
# and passing that "txt" to the "emoji.replace_emoji()" function
# to remove emojis from the string.


def remove_using_emoji(txt):
    return emoji.replace_emoji(txt, '')


txt = "Data ✍ 🌷 πŸ“Œ πŸ‘ˆπŸ» πŸ–₯ πŸ˜€ πŸ˜ƒ πŸ˜„ 😁 πŸ˜† πŸ˜… πŸ˜‚"
print("Before removing emojis")
print(txt)
print("After removing emojis")
print(remove_using_emoji(txt))

Output

Before removing emojis
Data ✍ 🌷 πŸ“Œ πŸ‘ˆπŸ» πŸ–₯ πŸ˜€ πŸ˜ƒ πŸ˜„ 😁 πŸ˜† πŸ˜… πŸ˜‚

After removing emojis
Data

This approach requires you to install the external library. However, if your aim is to create a simple solution, I highly recommend using this approach. Furthermore, developers regularly update the library to include new types of emojis so it also covers all the use cases.

Method 3: Using “clean-text” module

Ahh, another third-party module called “clean-text” handles emojis very well! If you are looking for not only emoji removal but also a comprehensive text-cleaning solution, then I would advise you to use this approach.

Using clean-text module

from cleantext import clean


# Creating a custom function that accepts "txt"
# and passing that "txt" to the "clean()" function
# to remove email addresses, digits, remove emojis


def remove_using_cleantext(txt):
    cleaned_data = clean(txt,
                         no_emails=True,  # Remove email addresses
                         no_digits=True,  # Remove digits
                         no_emoji=True,  # Remove emojis
                         replace_with_email="",  # Replace emails with empty string
                         replace_with_digit="")  # Replace digits with empty string
    return cleaned_data


txt = "Contact me at krunal@appdividend.com or call 444 555 9999 πŸ“ž. Have a great day! 😊"
print("Before removing email address, emojis, and digits")
print(txt)

print("After removing email address, emojis, and digits")
print(remove_using_cleantext(txt))

Output

Before removing email address, emojis, and digits
Contact me at krunal@appdividend.com or call 444 555 9999 πŸ“ž. Have a great day! 😊

After removing email address, emojis, and digits
contact me at or call . have a great day!

As you can see from the code, we used cleantext.clean() function and pass the “txt” and “no_emoji = True”, no_emails=True, no_digits=True arguments to remove emojis, emails, and digits from an input text and replace default “email” and “digit” with an “empty string”. The final output won’t include any of these things.

This solution is simple. However, if you are just removing specific things, then I would not endorse you to use this approach because there are targeted packages available that you can use.

You should only use this method when you want to removeΒ a combination of data, such asΒ “emojis, emails, and digits,” “emojis and emails,” “emojis and digits,” or any other combination you find plausible.

That’s all I needed to address for this tutorial. I hope all of you programmers have a nice day.

Post Views: 672
Share on:
Krunal Lathiya

With a career spanning over eight years in the field of Computer Science, Krunal’s expertise is rooted in a solid foundation of hands-on experience, complemented by a continuous pursuit of knowledge.

How to Validate an IPv4 and IPv6 Addresses in Python
How to Check If a String is a Valid Email Address in Python

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Address: TwinStar, South Block – 1202, 150 Ft Ring Road, Nr. Nana Mauva Circle, Rajkot(360005), Gujarat, India

Call: (+91) 9409548155

Email:Β support@appdividend.com

Online Platform

  • Pricing
  • Instructors
  • FAQ
  • Refund Policy
  • Support

Links

  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of services

Tutorials

  • Angular
  • React
  • Python
  • Laravel
  • Javascript
Copyright @2024 AppDividend. All Rights Reserved
Appdividend