Merging Multiple PDF Files into One in Python

There are many online tools available for merging multiple PDFs into one file, but they have privacy concerns. If your PDFs contain secret information, then the online method is neither secure nor viable. That’s where a basic programming language like Python comes into play.

Python provides multiple third-party libraries that you can use to handle PDF operations. Two such libraries are the following:

PymuPDF
PyPDF2

To merge multiple PDFs into a single (merged) PDF, you can either use pymupdf’s .insert_file() function or “pypdf2’s PdfMerger()“ class.

Preparing PDFs

Before going into action, we have to prepare pdf files for merging. For this practical, I have created one folder called “pdfs” in my current working directory, and inside that folder, I have two pdfs like this:

Here are two files that you can download and use in your practical:

first.pdf
second.pdf

Approach 1: Pymupdf’s .insert_file() function

The Pymupdf library provides various functions to open, merge, write, and close pdf objects.

One of the big advantages of using the pymupdf is that it performs well with large PDF files and it is a cross-platform library.

Here is the step-by-step guide:

Step 1: Install pymupdf

Install the pymupdf library using the command below:

pip install pymupdf

Step 2: Import it into your file

# Importing pymupdf library

import pymupdf

Step 3: Prepare a list of input PDF files

# List of input pdf files

files = ['pdfs/first.pdf', 'pdfs/second.pdf']

Step 4: Creating an empty document

We can create an empty PDF document, which we will use in the future to add pages to it.

# Creating an empty document object

pdf_doc = pymupdf.open()

Step 5: Appending input files

We already created an empty PDF object, and now it is time to add pages from the input files to it.

We will iterate the loop through the files list and append input files to an empty object using the .insert_file() method.

# Appending files to an empty document object

for filename in files:
    pdf_doc.insert_file(filename)

Step 6: Saving the Merged PDF

# Saving merged pdf and close the file object

pdf_doc.save("pdfs/pymupdf_merged.pdf")

pdf_doc.close()

Inside the “pdfs” folder, we are saving the pymupdf_merged.pdf file, which is the concatenation of two input files. Here is the complete code:

# Importing pymupdf library
import pymupdf

# List of input pdf files
files = ['pdfs/first.pdf', 'pdfs/second.pdf']

# Creating an empty document object
pdf_doc = pymupdf.open()

# Appending files to an empty document object
for filename in files:
    pdf_doc.insert_file(filename)

# Saving merged pdf and close the file object
pdf_doc.save("pdfs/pymupdf_merged.pdf")
pdf_doc.close()

Here is the merged pdf:

Approach 2: PyPDF2’s PdfMerger() class

The PyPDF2 is another PDF manipulation library that provides a concatenation class.

We will create a custom function that accepts the input file list and the output file’s path.

PyPDF2 is simple and helpful in basic PDF operations. If you are working with a simple PDF file, I would recommend you to use this library, but if your library is complex and requires special attention, I would recommend you to use the “pymupdf”.

Here is the step-by-step guide for this approach:

Step 1: Import PyPDF2

# Import PyPDF2 library

import PyPDF2

Step 2: Defining the custom function

Let’s create a custom function merge_pdfs() that will accept two arguments:

input_files
output file

# Defining custom function that merges multiple pdfs

def merge_pdfs(input_files, output_file):

Step 3: Create an instance of the PdfMerger class

As we already discussed, the PyPDF2 library provides a PdfMerger class, and we have to create its instance.

# Creating an instance of PdfMerger class
  
pdf_merger = PyPDF2.PdfMerger()

Step 4: Appending Input Files

We will use a pdfmerger instance and append all the input files into this one.

# Appending input files
for pdf_file in input_files:
    with open(pdf_file, 'rb') as file:
        pdf_merger.append(file)

Step 5: Writing an Output File

Use the .write() function on the pdf_merger instance to write a final merged pdf file.

# Writing an output merged pdf file
   with open(output_file, 'wb') as file:
       pdf_merger.write(file)

# Closing the merger instance
   pdf_merger.close()

We also closed the pdf_merger instance to free up the computing resources.

Step 6: Calling the custom function

Our final step is to call the custom function with input_files and output file:

# Defining input and output files
input_files = ['pdfs/first.pdf', 'pdfs/second.pdf']
output_file = 'pdfs/merged_file.pdf'

# Calling a custom function
merge_pdfs(input_files, output_file)

Here is the complete code:

# Import PyPDF2 library
import PyPDF2


# Defining custom function that merges multiple pdfs
def merge_pdfs(input_files, output_file):

    # Creating an instance of PdfMerger class
    pdf_merger = PyPDF2.PdfMerger()

    # Appending input files
    for pdf_file in input_files:
        with open(pdf_file, 'rb') as file:
            pdf_merger.append(file)

    # Writing an output merged pdf file
    with open(output_file, 'wb') as file:
        pdf_merger.write(file)

    # Closing the merger instance
    pdf_merger.close()


# Defining input and output files
input_files = ['pdfs/first.pdf', 'pdfs/second.pdf']
output_file = 'pdfs/pypdf2_merged.pdf'

# Calling a custom function
merge_pdfs(input_files, output_file)

If you execute the above code, you will get this merged file:

That’s all!

Post Views: 30

Krunal Lathiya

With a career spanning over eight years in the field of Computer Science, Krunal’s expertise is rooted in a solid foundation of hands-on experience, complemented by a continuous pursuit of knowledge.