Regular Expressions in Java: Complete Guide

Regular Expressions or Regex in Java is a particular sequence of characters used to search and manipulate the strings (or text) using the special syntax in the sequence. It is contained in the java.util.regex package. This package primarily provides three classes:

#Pattern Class

It is used to create regular expressions. It does not provide any public constructors and uses a method named compile() to create objects representing regex. So, the Pattern class provides no public constructors. If we want to create a pattern, you must first invoke one of its public static compile() methods, returning the Pattern object. These methods accept a regular expression as the first argument.

#Matcher class

It is used to match the pattern and the string. However, it does not provide the constructor either. Like the Pattern class, Matcher defines no public constructors. Instead, you obtain the Matcher object by invoking the matcher() method on a Pattern object.

#PatternSyntaxException

It provides an unchecked exception for a syntax error in the regex. The PatternSyntaxException object is an unchecked exception that indicates a syntax error in a regular expression pattern.

Along with these classes, java.util.regex package also includes the MatchResult interface.

See the following figure.

Regular Expressions in Java Tutorial

Regular Expressions in Java

The Matcher and Pattern classes provide a facility of Java regular expression. The java.util.regex package provides the following classes and interfaces for regular expressions.

  1. MatchResult interface
  2. Matcher class
  3. Pattern class
  4. PatternSyntaxException class

#Capturing Groups

Capturing groups are the way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside the set of parentheses. For example, the regular expression (dog) creates a single group containing the letters “d,” “o,” and “g.”

Capturing groups are numbered by counting their opening parentheses from left to right. In the expression ((A)(B(C))), for example, there are four groups −

  • ((A)(B(C)))
  • (A)
  • (B(C))
  • (C)

If we want to find out how many groups are present in the expression, you can call the groupCount method on the matcher object. The groupCount method returns the int showing several capturing groups present in Matcher’s pattern.

There is also a particular group, group 0, which always represents an entire expression. Therefore, that group is not included in the total reported by the groupCount.

#Regex syntax

Several special characters are used to create regular expressions. A few of them, along with their usage, are listed below.

Expression Usage
. It matches any single character.
^ It matches the start of the line
$ It matches the end of the line
\A It matches the beginning of the string
\z Matches the end of the string
\d Matches the digits characters
\D Matches non-digits characters
\w It matches the word characters
\W It matches the non-word characters
[…] It matches any character in brackets 
[^…] It matches any character, not in brackets
a| b Matches either a or b 
reg{n} Matches reg where the length of the match is n
reg{n,} Matches reg where length is more than  n

 

#Regex Character classes

No. Character Class Description
1 [abc] a, b, or c (simple class)
2 [^abc] Any character except a, b, or c (negation)
3 [a-zA-Z] a through z or A through Z, inclusive (range)
4 [a-d[m-p]] a through d, or m through p: [a-dm-p] (union)
5 [a-z&&[def]] d, e, or f (intersection)
6 [a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
7 [a-z&&[^m-p]] a through z, and not m through p: [a-lq-z](subtraction)

 

#Regex Quantifiers

The quantifiers specify the number of occurrences of the character.

Regex Description
X? X occurs once or not at all
X+ X occurs once or more times
X* X occurs zero or more times
X{n} X occurs n times only
X{n,} X occurs n or more times
X{y,z} X occurs at least y times but less than z times

 

#Regex Metacharacters

The regular expression metacharacters work as shortcodes.

Regex Description
. Any character (may or may not match the terminator)
\d Any digits short of the [0-9]
\D Any non-digit, short for the [^0-9]
\s Any whitespace character, short for the [\t\n\x0B\f\r]
\S Any non-whitespace character, short for the [^\s]
\w Any word character, short for the [a-zA-Z_0-9]
\W Any non-word character, short for the [^\w]
\b A word boundary
\B A non-word boundary

#Pattern class methods

#Static methods

The following are the static methods.

  1. Pattern compile(String pattern) is used to create the regex pattern.
  2. Pattern compile(String pattern, int flag): it creates the regex pattern with given flags.
  3. boolean matches(String pattern, CharSequence str): creates the regex pattern and matches it with the provided input.
  4. String quote(String str): returns a literal pattern string.

#Non-Static methods

  1. String toString(): It returns the string equivalent of the regex pattern.
  2. Matcher (CharSequence input): it matches the given input with the regex pattern.
  3. int flags(): returns the match flags of the regex pattern.
  4. int pattern(): returns the regex from which the given regex pattern is compiled.

#Matcher class methods

  1. boolean find(): it is used to search the pattern in the given input.
  2. boolean find(int index): searches the occurrences of the regex in the input starting from the specified index.
  3. int start(): returns a starting index of the match found using find().
  4. int end(): returns the index of the next character to the match, found using find().
  5. Boolean matches(): checks whether the whole text matches the regex or not.
  6. String group(): returns the matched subsequence in the text.
  7. int groupCount(): returns the total number of matches subsequences in the text.

#PatternSyntaxException class methods

  1. String getPattern(): returns the pattern which produced this exception.
  2. String getDescription(): returns the description of the exception.
  3. String getMessage(): returns the complete description along with the regex pattern and the index of the error.
  4. int getIndex(): returns the index of this error.

#MatchResult Interface Methods

  • int start(): returns a starting index of the match.
  • int end(): Return the character’s index occurring just after the match.
  • String group(): returns the subsequence found.
  • int groupCount(): returns the total number of subsequences in the text.

The following program shows simple string matching using java regex.

See the following programming example.

import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Example1
{
  public static void main(String [] args)
  
    //creating the regular expression patern to be searched
    Pattern reg=Pattern.compile("york");

    //defining the string to be searched
    String str="new york";

    //defining the matcher function
    Matcher m=reg.matcher(str);

    //returning the starting and ending index of the match
    m.find();
    int start=m.start();
    int end=m.end()-1;
    System.out.println("match occurs from index: "+start+" to index: "+end);
  
}

See the following output.

MatchResult Interface Methods

The following program checks whether a string matches with a good pattern or not.

import java.util.regex.Pattern;
import java.util.regex.Matcher;

class Example2
{
  public static void main(String [] args)
  
    /*creating the regular expression patern to be searched: must be greater than 10, less than 15 characters and must contains only small aplhabets or digits*/
    Pattern condition=Pattern.compile(".*([a-z]|\\d){10,15}");

    //defining the string to be searched
    String str1="Mahesh";
    String str2="nicework152";
    String str3="Hello12";

    //defining the matcher function
    Matcher m1=condition.matcher(str1);
    Matcher m2=condition.matcher(str2);
    Matcher m3=condition.matcher(str3);

    //return whether the string matched the pattern or not
    System.out.println(str1+" matched: "+m1.find());
    System.out.println(str2+" matched: "+m2.find());
    System.out.println(str3+" matched: "+m3.find());
}

See the following output.

PatternSyntaxException class methods

The following program fetches all mobile numbers from a text.

import java.util.regex.Pattern;
import java.util.regex.Matcher;

class Example3
{
  public static void main(String [] args)
  {
    //creating the regular expression patern to be searched
    Pattern mob=Pattern.compile("\\d{10}");

    //defining the string to be searched
    String text="Mr. sharma's contact number is 9876543210 and Mrs. gupta's contact number is 0123456789";

    int count=0;
    //defining the matcher function
    Matcher m=mob.matcher(text);

    //returning the starting and ending index of the match
    while(m.find())
    {
      count++;
      System.out.println(m.group());
    }
    System.out.println("total contact numbers: "+count);
  }
}

See the output.

Matcher class methods

That’s it for this tutorial.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.