AppDividend
Latest Code Tutorials

Python RegEx Tutorial With Example | Regular Expressions in Python

0

Python RegEx Tutorial With Example | Regular Expressions in Python is today’s topic. A RegEx, or Regular Expression, is the sequence of characters that forms the search pattern. RegEx can be used to check if the string contains the specified search pattern. The regular expression in a programming language is a unique text string used for describing a search pattern. It is beneficial for extracting information from text such as code, files, log, spreadsheets, or even documents. Now, let’s start a Python RegEx Tutorial with example.

Python RegEx Tutorial With Example

While using the regular expression, the first thing is to recognize that everything is essentially a character, and we are writing the patterns to match the specific sequence of characters also referred to as a string. The Ascii or Latin letters are those that are on your keyboards and Unicode is used to match a different text.

It includes the digits and punctuation and all the special characters like $#@!%, etc.

For instance, a regular expression could tell the program to search for a specific text from the string and then to print out the result accordingly. The phrase can include the following.

  1. Text matching
  2. Repetition
  3. Branching
  4. Pattern-composition etc.

#Python RegEx Module

We can import the Python re module using the following code.

import re
  1. re” module included with a Python primarily used for string searching and manipulation.
  2. Also used frequently for web page for “Scraping” or extract a large amount of data from websites.

#RegEx in Python

Search the string to see if it starts with “The” and ends with “Australia.”

# app.py

import re

data = "The rain in Australia"
x = re.search("^The.*Australia$", data)
if (x):
  print("YES! We have a match!")
else:
  print("No match")

See the output.

Python RegEx Tutorial With Example

 

You can see the return object from the search function.

# app.py

import re

data = "The rain in Australia"
x = re.search("^The.*Australia$", data)
print(x)

See the following output.

Python RegEx Module

 

#Python RegEx Functions

Python re module offers the set of functions that allows us to search the string for the match.

Function Description
findall Returns the list containing all matches
search Returns the Match object if there is a match anywhere in the string
split Returns the list where a string has been split at each match
sub It replaces one or many matches with a string

 

#Python RegEx findall() Method

Python findall() method returns a list containing all matches.

See the following code example.

# app.py

import re

data = "The rain in Australia"
x = re.findall("Aus", data)
print(x)

See the below output.

Python RegEx findall() Method

 

The list contains the matches in the order they are found. If no matches are found, the empty list is returned. The findall() method is case sensitive. See the following code.

# app.py

import re

data = "The rain in Australia"
x = re.findall("aus", data)
print(x)

See the output.

Regular Expressions in Python

 

#Python RegEx search() Method

The search() function searches the string for the match and returns the Match object if there is a match.

If there is more than one match, only the first occurrence of the match will be returned.

See the following code example.

# app.py

import re

data = "The rain in Australia"
pos = re.search("\s", data)
print("The first white-space character is located", pos.start())

See the following output.

Python RegEx search() Method

 

#Python RegEx split() Method

The split() function returns the list where the string has been split at each match.

Now, let’s split at each white-space character. See the following code.

# app.py

import re

data = "The rain in Australia"
result = re.split("\s", data)
print(result)
See the following output.
Python RegEx split() Method

You can control a number of occurrences by specifying the maxsplit parameter.

Let’s split the string only at the first occurrence.

# app.py

import re

data = "The rain in Australia"
result = re.split("\s", data, 1)
print(result)

See the output.

Python RegEx split() Method Tutorial

 

#Python RegEx sub() Method

The sub() function replaces the matches with a text of your choice.

Let’s replace every white-space character with the symbol ‘~~~’.

# app.py

import re

data = "The rain in Australia"
result = re.sub("\s", "~~~", data)
print(result)

See the output.

Python RegEx sub() Method

 

#Python Metacharacters

Metacharacters are characters with a special meaning, which is the following.

Character Description Example
[] A set of characters “[a-m]”
\ Signals the special sequence (can also be used to escape special characters) “\d”
. Any character (except newline character) “he..o”
^ Starts with “^hello”
$ Ends with “world$”
* Zero or more occurrences “aix*”
+ One or more occurrences “aix+”
{} Exactly the specified number of occurrences “al{2}”
| Either or “falls|stays”
() Capture and group

 

#Python Special Sequences

A particular sequence is a \ followed by one of the characters in the list below and has a special meaning.

Character Description Example
\A Returns the match if the specified characters are at the beginning of the string “\AThe”
\b Returns the match where the specified characters are at the beginning or the end of a word r”\bain”
r”ain\b”
\B Returns the match where the specified characters are present, but NOT at the beginning (or at the end) of a word r”\Bain”
r”ain\B”
\d Returns the match where the string contains digits (numbers from 0-9) “\d”
\D Returns the match where a string DOES NOT contain digits “\D”
\s Returns the match where the string contains a white space character “\s”
\S Returns the match where the string DOES NOT contain a white space character “\S”
\w Returns the match where a string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character) “\w”
\W Returns the match where the string DOES NOT contain any word characters “\W”
\Z Returns the match if the specified characters are at the end of the string. “Spain\Z”

 

#Python Sets

A set is the set of characters inside a pair of square brackets [] with a special meaning.

Set Description
[arn] Returns the match where one of the specified characters (ar, or n) are present
[a-n] Returns the match for any lower case character, alphabetically between a and n
[^arn] Returns a match for any character EXCEPT ar, and n
[0123] Returns the match where any of the specified digits (012, or 3) are present
[0-9] Returns a match for any digit between 0 and 9
[0-5][0-9] Returns the match for any two-digit numbers from 00 and 59
[a-zA-Z] Returns the match for any character alphabetically between a and z, lower case OR upper case
[+] In sets, +*.|()$,{} has no special meaning, so [+] means: return a match for any + character in the string

 

Example of w+ and ^ Expression

See the following characters.
  1. “^”: This expression matches a start of the string
  2. “w+“: This expression matches an alphanumeric character in a string

Here we will see the example of how we can use the w+ and ^ expression in our code. We cover re.findall function later in this tutorial, but for a while, we focus on \w+ and \^ expression.

For example, for our string “appdividend, is fun” if we execute the code with w+ and^, it will give the output “appdividend”.

See the following code.

# app.py

import re

data = "appdividend, is fun"
result = re.findall("^\w+", data)
print(result)

See the output.

Example of w+ and ^ Expression

 

Remember, if you remove the +sign from the w+, the output will change, and it will only give a first character of the first letter, i.e., [a].

Using re.match() function in Python

The match function is used to match the RE pattern to string with tge optional flags. In this method, the expression “w+” and “\W” will match the words starting with a letter ‘g’ and after that, anything which is not started with ‘g’ is not identified. If we want to check match for each element in the list or string, we run the for a loop. See the following code.

# app.py

import re

listA = ["appdividend10 giveaway",
        "appdividend10 giveup",
        "appdividend javascript"]

for element in listA:
  result = re.match("(a\w+)\W(g\w+)", element)
  if result:
    print((result.groups()))

See the following output.

Using re.match() function in Python

 

Summary

The regular expression in a programming language is the special text string used for describing a search pattern. It includes the digits and punctuation and all special characters like $#@!%, etc. An expression can include literal

  1. Text matching
  2. Repetition
  3. Branching
  4. Pattern-composition etc.

In Python, the regular expression is denoted as RE (REs, regexes or regex pattern) are embedded through re module.

  1. The “re” module included with Python primarily used for string searching and manipulation
  2. Also used frequently for webpage “Scraping” (extract the large amount of data from websites)
  3. Regular Expression Methods include the re.match(),re.search()& re.findall()
  4. Python Flags Many Python Regex Methods, and Regex functions take an optional argument called Flags
  5. This flags can modify the meaning of a given Regex pattern
  6. Many Python flags used in Regex Methods are re.M, re.I, re.S, etc.

Finally, Python RegEx Tutorial With Example | Regular Expressions in Python is over.

Leave A Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.