How to Delete Pages from PDF File using Python

Whether you want to create a short PDF from the original PDF or remove unnecessary content, you must delete pages to make it more lightweight. Furthermore, it reduces the file size, making memory management more efficient.

Here are three ways to delete pages from PDF using Python:

Using pymupdf
Using pypdf2
Using pdfrw

For this practical implementation, we will use the five-page PDF like this:

We will remove some of the pages from the above PDF for the demonstration.

Here is the file: sample_5_pages.

Method 1: Using PyMuPDF

If you are looking for a memory-efficient solution among many PDF libraries then I highly recommend using the PyMuPDF library. It provides a .delete_page() function that will accept the index as a page number and remove it.

You can install it using the “pip”:

pip install pymupdf

You can import it as “fitz” like this code:

import fitz

Here is the complete code:

import fitz


def delete_pages(input_path, output_path, pages_to_delete):
    doc = fitz.open(input_path)
    pages_to_delete.sort(reverse=True)
    for page_num in pages_to_delete:
        doc.delete_page(page_num - 1)  # 0-indexed
    doc.save(output_path)
    doc.close()


# Usage
input_path = 'sample_5_pages.pdf'
output_path = 'reduced.pdf'
pages_to_delete = [1, 3, 5]  # Page numbers to delete (1-indexed)
delete_pages(input_path, output_path, pages_to_delete)
print("Pages no 1, 3, and 5 deleted successfully")

Output

As illustrated in the screenshot above, our output PDF has only 2 pages, page numbers 2 and 4. Page numbers 1, 3, and 5 have been deleted successfully.

Complexities

Time Complexity: O(n), where n is the number of pages to delete from a PDF file.
Space Complexity: O(1) because it operates on the PDF file directly which will save space, making it more efficient.

Pros

It operates blazingly fast and is memory efficient.
It can handle large PDF files, so you don’t require any special treatment for that.
Not only does it delete pages but also you can extract text from it or perform various operations.

Cons

PyMuPDF is a third-party library, so it is an external dependency.
If you are looking for simple operations then you don’t need external dependency. It will be overkill.

Method 2: Using PyPDF2

The most popular library to use for simple operations is PyPDF2. If you are looking for a solution where you need to remove specific pages rather than range, then this is the approach you should go for.

 import PyPDF2

# Custom function to delete pages
def delete_pages(input_path, output_path, pages_to_delete):
    with open(input_path, 'rb') as file:
        pdf_reader = PyPDF2.PdfReader(file)
        pdf_writer = PyPDF2.PdfWriter()

        for page_num in range(len(pdf_reader.pages)):
            if page_num + 1 not in pages_to_delete:
                page = pdf_reader.pages[page_num]
                pdf_writer.add_page(page)

    with open(output_path, 'wb') as output_file:
        pdf_writer.write(output_file)


# Calling the custom function
input_path = 'sample_5_pages.pdf'
output_path = 'reduced.pdf'
pages_to_delete = [1, 2, 3, 5]  # Page numbers to delete (1-indexed)
delete_pages(input_path, output_path, pages_to_delete)
print("Pages number 1, 2, 3, and 5 have been deleted")

Output

From the image above, it’s clear that we removed four pages from the PDF and only 1 page is remaining which is number 4.

Complexities

Time Complexity: O(n), where n is the number of pages in the PDF.
Space Complexity: O(n), because it needs to store the entire PDF in memory while processing.

Pros

It works well with small-to-medium-sized PDFs.
It provides a simple API to work with.

Cons

It is not as memory efficient as PyMuPDF.
It loads an entire file into memory, so it becomes slow for large PDF files.

Method 3: Using pdfrw

The pdfrw is a third-party PDF library that has a unique usecase. When you want to preserve the original PDF structure while operating, you should use this library.

You can install the pdfrw library using the command below:

pip install pdfrw

Here is the complete Python code:

from pdfrw import PdfReader, PdfWriter


def delete_pages(input_path, output_path, pages_to_delete):
    reader = PdfReader(input_path)
    writer = PdfWriter()

    for page_num, page in enumerate(reader.pages, 1):
        if page_num not in pages_to_delete:
            writer.addpage(page)

    writer.write(output_path)


# Calling the custom function
input_path = 'sample_5_pages.pdf'
output_path = 'reduced.pdf'
pages_to_delete = [3, 4, 5]  # Page numbers to delete (1-indexed)
delete_pages(input_path, output_path, pages_to_delete)
print("Pages number 3, 4, and 5 have been deleted")

Output

You can tell from the above screenshot that we generated a new PDF where the last 3 pages are not there.

Complexities

Time Complexity: O(n)
Space Complexity: O(n)

Pros

Faster than PyPDF2

Cons

Not as good at memory efficiency and feature-rich as PyMuPDF
It can struggle if the PDF has a complex structure.

Final analysis

Which library to choose always depends on which type of requirement you have.

If performance and speed are priorities, use the “PyMuPDF”.
For small to medium PDFs with simple structures, use the “PyPDF2” or “pdfr2”.

Post Views: 14

Krunal Lathiya

With a career spanning over eight years in the field of Computer Science, Krunal’s expertise is rooted in a solid foundation of hands-on experience, complemented by a continuous pursuit of knowledge.