Skip to content
  • (+91) 9409548155
  • support@appdividend.com
  • Home
  • Pricing
  • Instructor
  • Tutorials
    • Laravel
    • Python
    • React
    • Javascript
    • Angular
  • Become A Tutor
  • About Us
  • Contact Us
Menu
  • Home
  • Pricing
  • Instructor
  • Tutorials
    • Laravel
    • Python
    • React
    • Javascript
    • Angular
  • Become A Tutor
  • About Us
  • Contact Us
  • Home
  • Pricing
  • Instructor
  • Tutorials
    • Laravel
    • Python
    • React
    • Javascript
    • Angular
  • Become A Tutor
  • About Us
  • Contact Us
Python

How to Extract Top-Level Domain (TLD) from URL in Python

  • 20 Sep, 2024
  • Com 0
How to Extract Top-Level Domain (TLD) from URL in Python

If you are categorizing websites by type (.edu, .tech, .ai) or region (.com, .in, .co.uk), you need to extract the Top-level Domain (TLD) from the URL. It will help you with web filtering systems and security tools.

What is TLD

The above figure shows what is TLD in a URL.

If you have used Google Analytics in the past, TLD extraction allows for grouping and analyzing traffic sources by domain type or country of origin. So, there are many use cases. But the question is, how to do it?

Here are three ways to extract top-level domain (TLD) from URL in Python:

  1. Using the tldextract library
  2. Using urllib.parse module
  3. Using regular expression

Method 1: Using the tldextract library

The tldextract module provides a method called “tldextract.extract()” to parse URL and TLD extraction.

Method 1 - Using the tldextract library

To use this module, you need to install it first using the command below:

pip install tldextract

Now, you need to import the library and use its method like this:

import tldextract


def extract_tld(url):
    extracted = tldextract.extract(url)
    return extracted.suffix


# Usage
url = "https://appdividend.com/category/python"
tld = extract_tld(url)
print(f"The TLD is: {tld}")

Output

The TLD is: com

I would highly recommend this approach because it provides accurate output and works well with multi-level domain-like (.co.uk, .co.in) and complex domains.

import tldextract


def extract_tld(url):
    extracted = tldextract.extract(url)
    return extracted.suffix


# Usage
url = "https://sprintchase.co.uk/"
tld = extract_tld(url)
print(f"The TLD is: {tld}")

Output

The TLD is: co.uk

This library uses Mozilla’s “public suffix list”, which regularly updates with the latest TLD information and handles edge cases very well. However, you need to install the library first. So, external dependency is there for this operation.

Space Complexity: O(1) – The space used is constant regardless of input size.

Time Complexity: O(n) – where n is the length of the URL string.

Method 2: Using urllib.parse module

The urllib.parse module provides “urlparse()” method that you can use to parse the URL and extract the domain. In the next step, we can split the domain and take the last part as the TLD. It is a built-in module, so it does not require additional library installation.

Method 2 - Using urllib.parse module

from urllib.parse import urlparse


def extract_tld(url):
    parsed_url = urlparse(url)
    domain = parsed_url.netloc
    tld = domain.split('.')[-1]
    return tld


# Usage
url = "https://sprintchase.com"
tld = extract_tld(url)
print(f"The TLD is: {tld}")

Output

The TLD is: com

The big disadvantage of this approach is that it does not work well with multi-level domains. For example, if you pass “https://sprintchase.co.uk”, it will return “.uk” and not “.co.uk”.

from urllib.parse import urlparse


def extract_tld(url):
    parsed_url = urlparse(url)
    domain = parsed_url.netloc
    tld = domain.split('.')[-1]
    return tld


# Usage
url = "https://sprintchase.co.uk"
tld = extract_tld(url)
print(f"The TLD is: {tld}")

Output

The TLD is: uk

This method is simple and fast but not accurate.

Method 3: Using regular expression

When you want to find a specific substring from a string or URL, regular expressions are always at your disposal. This approach will always help you to get what you need from a text.

Python’s “re” module provides “re.search()” method that tries to match the pattern in the URL and if it finds then we will extract the last substring after the “.”(dot) from the match and return it to the user.

import re


def extract_tld_re(url):
    pattern = r'(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n]+)\.([^:\/\n]+)'
    match = re.search(pattern, url)
    if match:
        return match.group(2)
    return None


# Usage
url = "https://sprintchase.com"
tld = extract_tld_re(url)
print(f"The TLD is: {tld}")

Output

The TLD is: com

This approach is customizable and you can create any type of pattern you want. However, it requires a good understanding of how regular expressions work and this approach does not work with multi-level domains like .co.uk.

import re


def extract_tld_re(url):
    pattern = r'(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n]+)\.([^:\/\n]+)'
    match = re.search(pattern, url)
    if match:
        return match.group(2)
    return None


# Usage
url = "https://sprintchase.co.uk"
tld = extract_tld_re(url)
print(f"The TLD is: {tld}")

Output

The TLD is: uk

The output should be “.co.uk” but it returns “.uk” which is incorrect!

You can optimize the performance of Regular Expressions by using techniques like lazy quantifiers and atomic grouping.

Space Complexity: O(1) – The space used is constant regardless of input size.

Time Complexity: O(n) – Where n is the length of the input URL string.

That’s all!

Post Views: 80
Share on:
Krunal Lathiya

With a career spanning over eight years in the field of Computer Science, Krunal’s expertise is rooted in a solid foundation of hands-on experience, complemented by a continuous pursuit of knowledge.

How to Extract String from Between Quotations in Python
Extracting a Date from a String Using Python

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Address: TwinStar, South Block – 1202, 150 Ft Ring Road, Nr. Nana Mauva Circle, Rajkot(360005), Gujarat, India

Call: (+91) 9409548155

Email: support@appdividend.com

Online Platform

  • Pricing
  • Instructors
  • FAQ
  • Refund Policy
  • Support

Links

  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of services

Tutorials

  • Angular
  • React
  • Python
  • Laravel
  • Javascript
Copyright @2024 AppDividend. All Rights Reserved
Appdividend